And, how to you grab a random page from that search engine?
A new Google employee, Ziv Bar-Yossef, gave a presentation at Google on August 17th answering those questions, which is available on a Google Techtalk video: Random Sampling from a Search Engine’s Index (video).
Ziv Bar-Yossef was most recently at Technion – Israel Institute of Technology, Israel, and as noted in the video, became a Google employee a couple of weeks ago. Before Technion, he was a researcher at the IBM Almaden Research Center.
The presentation is based upon a paper which won the 2006 International World-Wide Web Conference Best Paper Award: Random Sampling from a Search Engine’s Index
Being able to grab random pages from a search engine’s index can provide some interesting information about that search engine. The presentation compares things such as the number of dead pages in Google, MSN, and Yahoo, as well as the freshness of text on each, and what percentage of dynamic pages they have indexed.
Patents and patent applications in the US from Ziv Bar-Yossef:
- Method and system for improving data quality in large hyperlinked text databases using pagelets and templates (US patent 6,968,331)
- System, method, and service for using a focused random walk to produce samples on a topic from a collection of hyper-linked pages (US Patent Application 20060122998)
- Methods and apparatus for assessing web page decay (US Patent Application 20060112089)
- Method and system for improving data quality in large hyperlinked text databases using pagelets and templates (US Patent Application 20030140307)