Yahoo Patents Anchor Text Relevance in Search Indexing

Yahoo was granted a patent this week which describes how anchor text in links may be used to increase the relevancy ranking of a page pointed to by that anchor text. The patent was originally filed in 2002, and it discusses how anchor text might work while naming the Altavista search engine as a possible place where the methods it describes might be implemented. Yahoo acquired the company that owned Altavista, and the technology is theirs.

While the patent is fairly old, it provides some details about how anchor text might be used by a search engine in a search index that may not be widely known.

It’s fairly common knowledge that the major commercial search engines pay attention to the anchor text in links pointing to pages, and may consider a page to be even more relevant for a query term if the term not only appears on a page, but also appears in the linked anchor text pointing to a page. Some pages may even be determined to be relevant for words that they don’t contain, but which show up in links to those pages.

However, we don’t know much about how much weight anchor text might be given when a search engine indexes a page, or if and how some anchor text might be determined to be more important than other anchor text.

We’ve even seen lately some discussion about experiments with two links on the same page, using different anchor text, pointing to the same page passing along relevance to that second page with both links – See Google passes second link’s anchor text.

How Anchor Text is Broken into Tokens to be Weighed

The process is fairly simple. It starts with a search engine collecting a list of pages that have hyperlinks pointing to a specific page.

The anchor text from those links are retrived and may be broken down into one or more tokens. For example, the anchor text pointing to a page might be “Best Louis Armstrong site.”

That anchor text might be broken down into the following tokens:

  • Best Louis Armstrong site
  • Louis Armstrong
  • Louis
  • Armstrong
  • Best
  • Best Louis
  • Best Armstrong
  • Best site

A weight might be calculated for each of those tokens, and if the weight for any token exceeds a certain threshold weight, the page it points towards might be indexed under that token.

The weight for tokens, or words and sequences of words, found in the anchor text is calculated using a formula that looks at how often each token can be found in anchor text pointing to that particular page, and how often the token appears in the search engine index.

Example

A web page to be indexed has a number of other pages linking to it.

The page is about the musician, Louis Armstrong.

The first linking page uses the anchor text “Louis.”

The second linking page uses the anchor text “Louis Armstrong.”

The third linking page uses the anchor text “best Louis Armstrong site.”

The fourth linking page uses the anchor text “Satchmo.”

The page to be indexed is highly relevant to the subject “Louis Armstrong” but it’s possible that conventional ranking methods may not rank that page as highly as it might deserve because the precise query terms may not appear in the page as frequently as in other, less relevant, pages. Or there may be many more sites that link to other less relevant pages about “Louis Armstrong”.

Each of the anchor texts pointing to the page may be broken down into tokens like my example for “Best Louis Armstrong site” was above.

When there are several tokens that are pointing to the same page, it might be helpful to see which of the tokens are the most important, and assign weights to those tokens.

This might be done by determining the weight of each token compared to the weight of all tokens pointing to the same page.

Some specific importance criteria might be used in the calculation of the weight of a token.

The greatest importance might be assigned to words that appear the least frequently in the search index, based on the idea that those words are more specifically related to the concept that a user would attempt to express.

A token that appears more frequently in anchor text pointing to a page could be given a higher weight.

Tokens that appear very frequently in the search index, such as “site” or “best”, might be discounted because they do not have specific importance in the context of the subject document.

Every token is assigned a weight and those tokens having a weight that is less than a threshold weight are discounted.

Tokens that are not discounted are are counted as being relevant for the page being indexed, with the tokens having the greatest weights being considered the most relevant.

The Yahoo Patent is:

Method for ranking web page search results
Invented by Andrei Z. Broder and Farzin Maghoul
Assigned to Overture Services, Inc.
US Patent 7,398,461
Granted July 8, 2008
Filed: January 24, 2002

Conclusion

While this patent seems to have been applied for almost a lifetime ago, it provides some insights into the importance of anchor text in ranking pages that anchor text points towards. It also provides some insight into how much weight different words and phrases within anchor text might have when determining the relevancy relationship between those words and a page linked to with them.

Another patent from Yahoo was also granted this week on How Network Usage traffic can be used to in ranking web pages, and in assigning weights to links and link text based upon the frequency of use of those links. I’ve written about that over on the SEM Clubhouse, in How A Search Engine May Use Web Traffic Logs in Ranking Web Pages.

Share

35 thoughts on “Yahoo Patents Anchor Text Relevance in Search Indexing”

  1. Yeah I always pay great attention to the anchor text pointing to my websites. This was a very detailed and descriptive post, great work. I especially like the part where you broke it down into possible tokens.

  2. This is an interesting post. Yahoo needs to find away to make its result more relevent. But it also seems to be something that black hat SEO could take advantage of to manipulate rankings.

  3. I don’t know which is better if compare the Yahoo’s patent on Anchor text and Google’s patent on Anchor text. But so far, I can see the result in Google is much more relevant than Yahoo’s search result.

  4. Anchor text has always played a part in my SEO efforts. I suppose taking it a step further and using anchor phrases would be included in this. I have always believed that a well written phrase or sentence that has a link embedded is worth a lot.

    On a side note… remember way back before Google dominated…. Alta Vista was my search engine of choice for several years…..

  5. Thanks, Chris (Pittsburgh SEO)

    I think the idea that search engines may be breaking down anchor text into tokens like that is something that hasn’t been discussed much.

    Hi Scott,

    Thank you. I do believe that Yahoo is working towards increasing the relevance of their results. Search really is in its infancy. This anchor text patent has been around for a long time – I would guess that they probably have explored many issues about how it might be abused by content developers who might want to abuse it.

    Hi Player,

    I’m not sure that we can say that what Yahoo and Google are doing might be too much different, but we can’t be certain for sure. There are a lot of factors that might play a role in how anchor text helps a page rank for a phrase or a topic that we may not know a lot about.

    Hi Novice SEO,

    Altavista was a favorite of mine back in the day, too. But really only when using the advanced search. I hated most of the results from the normal search that I would receive.

    Consideration of Anchor text should be something anyone building a web page should give some consideration to, whether for search engines, or for the people who visit a site and might want to know where that link leads.

    Thanks Website Waves.

    There are lots of things to consider when creating link text. It does make things interesting.

  6. Interesting. I’d say an argument could be made that this infringes on Lycos’s first patent (#5748954). Among the claims in that much earlier patent is:

    3. The method of claim 1 wherein said step of processing said downloaded file includes the step of storing link text, and including the step of merging said stored link text to generate certain information about files referenced in the downloaded files for the catalog.

    Just my $0.02 from someone who was there way back when…

    -John.

  7. Hi John,

    I definitely appreciate your perspective. Thanks for stopping by and commenting.

    I looked through some of the history of this patent to see if the patent examiner might have raised any challenges to the inventors based upon the Lycos patent you’ve linked to, or any others. One that came up was this one:

    Method for ranking documents in a hyperlinked environment using connectivity and selective content analysis

    The Lycos patent does describe the collection and use of anchor text in the rankings of pages, along with describing other signals that might be used in ranking pages. It doesn’t appear to go into as much detail on the use of anchor text itself, or how different anchor texts might be weighed to determine how much relevance they may hold, if any at all.

    It’s possible that is a distinction that made a difference in the granting of this patent, though we don’t know if the patent examiner’s office considered the Lycos patent.

    It does appear that the use of the words “anchor” and “anchortext” played a strong role in the patent examiner’s search strategy, as defined in “Examiner’s search strategy and results” documents filed by the examiner during the patent process. The Lycos patent does refer to “link text” but “anchor” and “anchortext” don’t show up in the document at all.

  8. The concept of anchor text being broken down into tokens is novel. I have often wondered if or when search engines might do something like this with anchor text in links.

  9. Pingback: Grumpy Links the next installment • Tim Nash UK SEO Blog
  10. Wow kool article and an eye opener. Yahoo search algorithm seems to be cloudy and not much is known about it. Any clue about it is worth a lot.

    I was wondering what Yahoo has done is something unique? Didn’t we all know it all the time before they publish it?

  11. Yep i will give much important to anchor text .. I do agree with your statements.
    Nice post brother :)
    regards

  12. I wish this would give Yahoo some edge at least. Want to see some one giving at least some kind of competition to the the big G.

  13. I’m glad that you found this post to be helpful, dust collecting fool. Thanks.

    Hi people finder,

    I’m not sure that I’ve seen a statement from any of the search engines before about breaking anchor text down into tokens either. It was nice to see that set out in such a straight forward manner in the patent application.

    Hi Swami SEO,

    I think this patent provided a good indication that search engines may look at a few different factors involving anchor text before they consider whether or not that text should be considered relevant for the page being pointed towards. Making that more explicit is something that we may not have been told by a search engine before.

    Hi abdul,

    Thanks.

    Hi rcplinks,

    I think that competition amongst search engines is good for everyone – it provides choices to searchers who may prefer one search engine over another, and that’s a good thing.

  14. Very good article on the topic. I know the importance of anchor text but didn’t know the tokens and how search engines can break a keyword in several different parts. Not with Yahoo only, this technique is being implemented on all search engines and specifically Google.

  15. Hi Violin,

    Thanks. The patent does do a nice job of introducing how tokens made from terms in anchor text can impact how much weight that link text might have.

    It is quite possible that other search engines such as Google might be using something very similar.

  16. That’s why search engines love the sites which have keywords inside. Sometimes it turns(Tokenism) into very weird results. A few time back, i have publishes an article on my blog and it had many typos(confessing) so in Google sitemaps, I could see many weird terms for which my blog was indexed. As a whole it is wonderful phenomena.

  17. To be very true, Search engines and their bots are very unpredictable. I did nothing to one of my site but it is on the top of Google for many keywords which are not present even in the domain name. I think we should be natural and everything (Link building and keyword stuffing) should be natural.

  18. Hi New Age,

    It can be really interesting to look through your stats and see what kinds of phrases your pages may end bringing traffic from search engines. They sometimes aren’t what you might expect.

    Hi Steve,

    Writing for a particular audience means using language that they might relate to, that they might expect to see when you write on a particular topic. I think keeping that in mind can be helpful when it comes to search engines and search traffic.

  19. I think that Anchor Texts do matter to some kind of extent… but I don’t think people should be too obsessed with it… as long as your content on your website is relevant, then relevant anchor text will only be a bonus to your SERPS

    Thanks a lot for the post though :) Helps a lot!

  20. Hi Oliver,

    Appreciate your comment.

    Anchor text does matter, to an extent. I think that there are at least three important goals with the anchor text to a link.

    The first is that the text of the link should be written in a way that makes people interested in clicking through to see the document on the other side of the link.

    The second (and just as important), is that the text of the link should give people an idea of what they might find on that document being linked to.

    The third is that the text may also help a search engine understand what it might find on the other side of that link. That may mean using a keyword phrase that the page is being optimized for, and if the keyword phrase is chosen well, and the phrase truly is about what is on that page, it may make it easier to include that phrase in the achor text of the link pointing to the page.

    But, make sure that the anchor text chosen follows those first two goals first.

  21. Hi Milo,

    I don’t think that this patent says that the value of a backlink is more important to the ranking of a page than the value of anchor text used in that link, or even the other way around. PageRank, or link equity, is a value that might be considered query independent, since it is an indication of the “importance” or quality of a page. Anchor text might be considered query dependent, since the text within a link (or possibly surrounding a link) might help a page rank more highly for terms used or possibly for related terms. With ranking algorithms that might consider hundreds of different ranking factors, it would be difficult to gauge how much influence one might have compared to another.

  22. Pingback: Raven SEO Weekly Digest - Issue 33 « Internet Marketing Blog

Comments are closed.