Microsoft Tracking Search and Browsing Behavior to Find Authoritative Pages

Between December 2005 and April 2006, researchers from Microsoft collected information about the searching and browsing activies of hundreds of thousands of Windows Live Toolbar users, with permission, to learn about the sometimes unranked and unindexed final destination pages that searchers ended up at in response to queries entered at Google, Yahoo, and Microsoft’s Live.com. … Read more Microsoft Tracking Search and Browsing Behavior to Find Authoritative Pages

Yahoo on Segmenting Web Sites into Topical Hierarchies

On one level, a search engine indexes a web site by crawling that site one URL at a time, collecting information about what it finds at that address, and indexing the information found so that it can be served to visitors later. But, the process can be more complicated than that. For instance, a search … Read more Yahoo on Segmenting Web Sites into Topical Hierarchies

New Google Process for Detecting Near Duplicate Content

near duplicate content

A new patent application on near duplicate content from Google explores using a combination of document similarity techniques to keep searchers from finding redundant content in search results. The Web makes it easy for words to be copied and spread from one page to another, and the same content may be found at more than … Read more New Google Process for Detecting Near Duplicate Content

Searching Future Events Using Yahoo News

searching future events

Imagine exploring millions and millions of news pages and other documents to find information about events that are scheduled to happen in the future, to help predict the future. This kind of future search, or future retrieval, might be able to support the making of decisions in many different fields. News information could be used … Read more Searching Future Events Using Yahoo News

Google Omits Needless Words (Boilerplate)

boilerplate

Computer programmers will sometimes use the term “boilerplate” code to refer to standard stock code that they often insert into programs. Lawyers use legal boilerplate in contracts – often the small print on the back of a contract that doesn’t change regardless of what a contract is about. A lot of web pages and documents … Read more Google Omits Needless Words (Boilerplate)