Google Patent Granted on Duplicate Content Detection in a Web Crawler System

Some patents from the search engines provide detailed looks at how those search engines might perform some of the core functions behind how they work. By “core functions,” I mean some of the basics such as crawling pages, indexing those pages, and displaying the results to searchers.

For example, last December I wrote a post titled Google Patent on Anchor Text and Different Crawling Rates, about a Google patent filed in 2003 which gave us a look at how the search engine crawled web pages, and collected the web addresses, or URLs, of pages that it came across.

The patent the post covered was Anchor tag indexing in a web crawler system, and it revealed how Google may determine how frequently it might visit or revisit certain pages, including crawling some pages daily, and others even on a real-time or near real-time basis – every few minutes in some cases. While there’s been a lot of discussion in the past few months online about real-time indexing of web pages, it’s interesting to note that the patent was orginally filed in 2003.

That older patent also covered topics such as how a search engine crawler might handle temporary (302) redirects differently than permanent (301) redirects, by noting and sometimes following the temporary redirects immediately (to make a decision as to what page to show in search results), and collecting the URLs associated with permanent redirects and putting them into a queue where they might be addressed later – up to a week or more later.

Continue reading “Google Patent Granted on Duplicate Content Detection in a Web Crawler System”

How a Search Engine May Choose Search Snippets

When you search at Google or Yahoo or Bing, you’ll see a set of search results that include a page title, a summary or snippet of the page, and a URL indicating the address of the page.

Often, that combination of title, snippet, and URL will be the deciding factor as to whether or not someone clicks through search results to a page.

The snippet peforms a couple of functions – it gives you a summary of what the page is about, and it shows you the context within which your query terms might appear on a page.

Sometimes a search engine will show you the Meta Description that the publisher of the page has come up with for a page, especially if the Meta Description contains the words found in the query.

Sometimes a search engine will show you a description that isn’t even found on the page, if it decides that the page is relevant for a query but the description for the page at the Yahoo Directory or DMOZ makes a better snippet than the meta description or any of the content found on the page.

Continue reading “How a Search Engine May Choose Search Snippets”