Using Anchor Text to Determine the Relevance of a Page
You go to a search engine, and type some query terms in the search box. A list of results is returned by the search engine, and you visit a link to one of the results that appears.
Looking through the page, you may not see your query terms on the page itself. Why would the search engine return that result to you?
Determining Relevance from Anchor Text
One reason might be that the search engine is looking at the anchor text in links pointing to the page to determine that the page is relevant for your query terms.
This can be very helpful when a page doesn’t have much text on it, such as a video or an audio file, or where the amount of text is very limited or is non-existent.
A patent application from Microsoft explores the use of anchor text to define the context of a page and terms that it might rank for that don’t appear upon that page.
It also discusses how the search engine might generate snippets for those pages which have been determined to be relevant for a query based upon the anchor text being used to point to those pages.
Using anchor text to provide context
Invented by Girish Kumar, Gaurav Sareen, Namita Gupta, Charles Lester Alexander Clarke, Junhua Wang
Assigned to Microsoft
US Patent Application 20080071739
Published March 20, 2008
Filed September 15, 2006
A search engine can provide referencing information as context for a particular search result when an excerpt from the search result, comprising at least some similar elements to the user’s query, is not generated.
Referencing information can include one or more anchor texts having similarity to at least some elements of the user’s query, the anchor texts being used by referencing pages to link to the page returned as a search result.
User selection of the anchor text can enable the user to visit a referencing page using that anchor text to link to the page returned as a search result, and having a high static rank.
There are many pages on the web that contain little or no text at all for a search engine to index. It’s possible that those pages may be very relevant to something that someone is searching for, but the lack of actual text upon the page may keep it from being indexed as relevant to that topic.
If a search engine can track the terms used by other pages to refer to that page within links, it enables the search engine to understand what others are saying that the page is about. This process may include HTML-based web pages, spreadsheets, word processing documents, PDFs, animation, audio, video, presentions (power point), and other documents.
Choosing Snippets for Anchor Text Relevant Pages
A search engine usually tries to show a title, description, and URL for a page that it lists in search results. When the query term doesn’t actually appear upon the page, it may try to come up with a snippet that may be meaningful to the person searching.
If there is text around the anchor text in the link that contains elements of the searcher’s query, then that text surrounding the anchor text may be used as part of the snippet shown to a searcher.
When there are multiple pages pointing to a page, and each may provide a helpful snippet from the text surrounding the anchor text, then some other factors might be considered in choosing which text to use for a snippet for the page, such as:
- The number of terms which the anchor text shares with the search string,
- The overall similarity of the anchor text to the search string,
- The language of the anchor text as compared to the search string and the results page,
- The differences between the anchor text, the query wording and the results page,
- The length of the anchor text,
- The static rank of the pages that contain the anchor text, and;
- Other factors.
Query Dependent and Query Independent Snippets
It’s possible for a search engine to come up with a snippet for a page that will show up regardless of what query is used to find that page, intended to be an accurate description of the content of the page.
This would be known as a query independent snippet because it wouldn’t change even if the query did. For instance, the main page of a company web site might use the company name, or the address of the page, such as “www.example.com.”
But a search engine might prefer to show some relationship between a page shown in search results and the query used to find the page.
It’s possible that the search engine may first want to provide a snippet that provides some reference to query and the content of the page, even if it needs use text from a page pointing to that page, instead of showing a query independent snippet.
An Example from Google, Yahoo, and Live.com
If you type the phrase “click here” (without the quotation marks) into Google, Yahoo, and Live.com, you’ll see the Adobe Acrobat Reader or Flash download page show up in the top ten results in each search engine, even though the phrase doesn’t actually appear on the Adobe pages.
There are so many links pointing to those pages that use that phrase as anchor text within the link that the search engines associate the pages with that query.
The snippets that show up associated with that result in each of the search engines vary from one search engine to another:
Yahoo (Adobe Reader page)
Download for Adobe Reader, which lets you view and print Adobe Portable Document Format (PDF)
Live.com (Flash Reader page)
We are unable to locate a Web player that matches your platform and browser.
Google (Adobe Reader page)
Industries: Broadcast and media · Education · Financial services · Government · Life sciences · Manufacturing; Solutions: Consumer photo and video · Mobile …
The Yahoo snippet seems to be taken from the Yahoo Directory Adobe listing description pointing to the Adobe Reader page, the Google snippet is taken from text upon the Adobe Reader page itself, and the Live.com page shows an error message that its search crawler probably saw when visiting a page linking to the Flash Reader page. Microsoft probably could have used a better choice of snippets.
Another issue with this approach of using anchor text to determine what a page is about, that we’ve seen with Google has been when anchor text is used maliciously to describe the content of a page in a manner that is different than what the page is actually about – which has long been known as Google Bombing.
A page that has been Google Bombed may show up in search results for a phrase that doesn’t appear upon the page, but which jokingly or maliciously describes the content or topic of the page.
Google took some steps to limit Google Bombing last year, which they write about in A quick word about Googlebombs. They don’t detail too much about the algorithm used to limit Google Bombing.
The Microsoft patent filing doesn’t address how it might limit “Google Bombing.”
If you run a web site, you may have visitors coming to your pages based upon the content anchor text in links pointing to your pages instead of the text upon your pages themselves.
You may be able to determine this by looking at the search referrals listed in your analytics or log files.
If you are, and the term is worth pursuing, you may want look at how that result appears in the search engine result pages to see the snippet being used.
If the term is one that you want to be found for, you may want to consider adding some text to the page, if possible, using that query term, to provide a more persuasive snippet for the search results.