Google Patents Anchor Text Snippets

Somewhere out there is a universe that looks exactly like this one, and appears to run exactly like this one. Except something’s a little different. A little off. It’s as if search engines took a left turn instead of a right turn, back in the early 2000s. Instead of using only using meta descriptions and possibly body text from web pages for descriptive text, or snippets, for those pages in search results, they learned a new trick. Imagine that the content surrounding anchor text in a link to a page was collected and evaluated based upon a quality score, and that this associated and usually descriptive text was used to generate snippets instead?

My thought on the possibility is that often anchor text doesn’t do the best job of describing a page, and often links to a page are from a third party who might not have the same interest in writing text that might make a good snippet for a page. But, Google filed a patent for such an approach back in 2003. And it was granted this week – so they pursued what was described within the patent for over a decade as well. The patent does mention that headings on pages might also be used as potential snippets for pages, and provide the following example: “Computers > Algorithms > Compression”. But that’s a small part of the patent. They don’t limit it to anchor text that a site might provide itself, like in breadcrumb trail navigation for a page.

There’s also a part to this approach that recognizes that many pages have more than one link to them, so a choice would need to be made as to the best “snippet” to show.

The anchor text, and text surrounding it is called a “web quote” in this version of the patent. The patent refers to an earlier version of the patent (a provisional version) that doesn’t use the term “Web Quote” to refer to the text associated with anchor text. In that earlier version, U.S. Provisional Application No. 60/363,559, filed Mar. 13, 2002, there’s an alternative description of that kind of text:

In one technique for improving the quality of a document index, additional terms found near hyperlinks in documents are used to enhance the description of the linked document. The premise of this technique is that web authors tend to described or comment about the content of other web pages in the descriptive text located near the link to the other page. This descriptive text may be used to enhance the quality of the index.

That earlier version of the patent tells us that using such text might help in the creation of a more comprehensive document index. It also tells us that this kind of associated text often accurately summarized the linked web page being pointed towards.

The older version of the patent doesn’t use term “web quote”, but the idea that this text near a link to a page being potentially very useful in creating a description of a page is the same in both.

Meta Descriptions as Snippets

During audits for websites, one recommendation that I frequently make recommendation is for clients to review and rewrite meta descriptions for pages. If they are well written, contain the query terms used to find a page, and are good fits for the page, a search engine might use them as a snippet to describe that page. If they are engaging and persuasive, they might influence people to click through from search results to the pages they describe. I’ve written a few posts in the past year about when Google might also decide to use content from pages as snippets:

None of those even begin to hint at the possible use of anchor text and text associated with, or surrounding it, as possible snippets for those pages.

Web Quotes as Snippets

In some cases, it is possible that neither content from meta descriptions or content from the pages themselves provided the best choices as snippets for a page in Google’s search results. Would Google instead use anchor text (and text that might be associated with that text, pointed to the page from another page, as a description of a document in a snippet? In 2002 – 2003 when this patent was being developed, it sounds like an idea worth exploring. If there’s any chance that the use of anchor text might be a good choice, that would explain why Google wouldn’t just abandon the idea, and abandon the patent. Then again, Google’s patent for the Google Directory was granted almost two years after Google sun-setted the Google Directory.

One aspect of this patent that I really find interesting is collecting “Web Quotes” as possible snippets based upon both anchor text and text surrounding it that might be within the same paragraph, and follow some other rules that might indicate that it might be a good choice for a snippet. There might be multiple choices of “Web Quotes” to use as a snippet for a page, since a page can have lot of links pointed to it from across the Web. These Web Quotes might be ranked base upon a “quality metric associated with the web quotes of the examined documents.”

The patent includes a number of search engineers who have been with Google from some of the earliest days, and a number of those are still with Google. (I think there’s a typo in the patent, and the last name should be Georges R. Harik, who was instrumental in launching Google Adsense and Adwords, and worked on projects such as “Gmail, Google Talk, Google Video, Picasa, Orkut, Google Groups and Google Mobile.” Here’s the patent:

Using text surrounding hypertext links when indexing and generating page summaries
Invented by Jeffrey A. Dean, Martin Farach-Colton, Sanjay Ghemawat, Benedict Gomes, Georges R. Hank;
Assigned to Google Inc.
US Patent 8,495,483
Granted July 23, 2013
Filed: March 12, 2003

Abstract

Web quotes are gathered from web pages that link to a web page of interest. The web quote may include text from the paragraphs that contain the hypertext links to the page of interest as well as text from other portions of the linked web page, such as text from a nearby header. The obtained web quotes may be ranked based on quality or relevance and may then be incorporated into a search engine’s document index or into summary information returned to users in response to a search query.

What to look at in the patent

I don’t think that Google will decide to start using anchor text and text associated with it as snippets for search results in the future. The three links I listed above all describe some of the factors that Google might look for within the content of a page to decide what to show as a snippet for that page.

But if you enjoy alternative histories and science fiction like Phillip K. Dick’s The Man in the High Castle, which describes an America in the 1960s after Germany and Japan won World War Two, and was occupied by both those countries, you might get that feeling of being in a left-handed universe while reading through the patent. There are some descriptions of how a search engines works back then, and how this approach to snippets might improve upon the experience.

Could Web Quotes ever be used by Google to provide alternative snippet for pages, based upon links to the page instead of content that appears upon the page itself?

I’m going to experiment a little with the idea, and try to create hypertext links to pages about things like Google using alternative titles and snippets in some cases, from sources that might be like this paragraph on a page that links to other pages on my site.

The patent does tell us that it might filter Web Quotes using a quote generator that might look for certain features in those Web Quotes. Here are the features listed in the patent:

  • The web quote’s length
  • Punctuation
  • Use of verbs
  • Positions of verbs
  • Use of adjectives
  • Etc.
  • How similar or different a Web Quote on one page pointed to a particular page might be compared to Web Quotes on other pages pointed to the same page
  • How similar a Web Quote might be to other Web Quotes with links pointed to multiple Pages
  • The PageRank of the page upon which the Web Quote appears upon

Does that last one mean that its more likely that a Web Quote with a link pointed to a page is more likely to be a snippet for that page if it’s on the front page of the New York Times, than if it were on the the front page of my local paper’s website (The Fauquier Times)? Maybe. The patent does tell us that how relevant a Web Quote is to the search terms used be a stronger consideration than a quality value like PageRank.

I will be keeping an eye out for the use of web quotes – anchor text plus text associated with that anchor text – as snippets in search results. If you see any being used by Google, please let me know. :)

Share

8 thoughts on “Google Patents Anchor Text Snippets”

  1. Easy to see how an approach like that could help Google better understand a document and what it is about.

    Since they are comparing web quotes from different pages to one another, it seems reasonable that they might use web quotes with the Key Word In Context (KWIC) algorithm that generates the description as well. For example, today different queries produce different snippets in the search results for a given URL and using what is described in the patent – different web quotes could help inform Google about what is the best description to show for a given query instead of having the same description for every query a given page might be relevant for.

    I think one place you could try an isolated test with this would be to create a page that is blocked by robots.txt. Link to the page using link text like “click here” but surround the link itself with nouns as an example. Since Google can’t crawl the page, they fall back to third party signals – but since you’re not giving any useful signals in the anchor text – it’d be interesting to see if Google might use the surrounding text, such as the nouns, as the title or description of the document. I doubt you’ll see it in the description, from what I’ve noticed – Google appear to be quite consistent with how they display the description in the SERPs for a page that is blocked by robots.txt but it might flow through to the title.

  2. Hi Bill,

    Great post. I have been waiting for someone to come up with this analogy of snippets for a while. I know there is more info underneath the covers…but a good start.

    Cheers
    Virginia

  3. Bill,

    This is a brilliant start to uncovering the future of anchor text variation/snippets. Thanks a lot for taking the time to follow google patents.

    Gregory Smith

  4. Hi Bill,

    Nice as always, As we know Google is already using different different methods to gauge and represent value of the webpages like from meta des. tag, body text, open directory project and even different title for brand specific search terms. so I strongly feel that this one will added in account too and would be important factor in ink-building.

  5. Triggerito, I like your case study. It really drives the point home. Bill, thanks for staying on top of the patents. While what they are doing is nothing new, we always find new “snippets” of information (pun intended) in these patent filings. Thanks.

  6. Thanks, Nick

    You’re welcome. Sometimes we learn more from some patents than from others, but I’ve been doing this for a long time, and I usually always learn something new, even if it’s a different perspective on things that we might be taking for granted sometimes. :)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>