How a Search Engine might Weigh the Relevance of Anchor Text Differently

One of the things that’s clear about how search engines work is that when they find a link pointing to a page using certain anchor text, that page might be seen to be a little more relevant for the text found in that link. Google pointed that out in one of the earliest white papers about how the search engine works:

This idea of propagating anchor text to the page it refers to was implemented in the World Wide Web Worm [McBryan 94] especially because it helps search non-text information, and expands the search coverage with fewer downloaded documents. We use anchor propagation mostly because anchor text can help provide better quality results. Using anchor text efficiently is technically difficult because of the large amounts of data which must be processed. In our current crawl of 24 million pages, we had over 259 million anchors which we indexed.

The Anatomy of a Large-Scale Hypertextual Web Search Engine

But one of the assumptions that many make is that each link, with its anchor text, is equally as important as any other link and that if a page has lots of links pointing to it with certain anchor text included in those links that it will rank more highly for the terms found in that text than it otherwise might in the absence of all those links.

A recently published patent application from Microsoft describes how they might weigh the relevance of anchor text links differently based upon relationships between pages where those links are found, or where they might be pointed towards. A number of Google patents also describe ways that they might weigh anchor text differently.

The Microsoft patent filing points out some examples of when one or more links pointing to a page might not carry as much anchor text weight as you might assume it would.

The first example is when you have one or more mirror sites of a particular site. When links and their related anchor text point from those mirror sites to another site, it isn’t really helpful from a search engine ranking perspective to count those links and their anchor text more than once.

A second example is when two anchor text links come from two websites that have cooperative relationships, where the sites are under the control of the same or related users, and tend to have “a substantial number of the same or similar anchor text links.”

A third situation described in the patent application is when anchor text links appear to be purposely created to boost the search rankings of a destination page.

The patent is:

Using Anchor Text With Hyperlink Structures for Web Searches
Invented by Zhicheng Dou, Junyan Chen, Ruihua Song, and Ji-Rong Wen
Assigned to Microsoft Corporation
US Patent Application 20110238644
Published September 29, 2011
Filed March 29, 2010

Abstract

This document describes tools for adjusting anchor text weight to provide more relevant search engine results. Specifically, these tools take advantage of a site-relationship model to consider relationships not only between an anchor text source site and a destination page but also relationships between multiple anchor text source sites to improve web searches.

Consideration of these relationships aids in determining a new an anchor text weight, which in turn results in more relevant search results.

when reading through this Microsoft patent application, I was reminded very much of a Google patent that I wrote about last year in my post, Google’s Affiliated Page Link Patent, in which Google described how they might attempt to gauge how related sites that linked to each other might appear to be while they determined how much weight to pass along from one page to another. That patent didn’t use the word “PageRank,” once, and it’s possible that the method described in it would apply to both a link quality measure like PageRank, and anchor text relevance as well.

I was also reminded of Google’s Reasonable Surfer patent. But while that patent looked at a wide range of features that might be associated with anchor text found on a page, it seemed to focus more upon how much link weight or PageRank might be passed along from any one link found on a page.

Google’s Phrase-Based Indexing Patents also provide different weights associated with anchor text based upon a number of different factors, as noted in the following passage from the first of those patents, Phrase-based indexing in an information retrieval system:

A given document d in the document collection may have some number of outlinks to other documents. Each outlink (a hyperlink) includes anchor text and the document identifier of the target document. For purposes of explanation, a current document d being processed will be referred to as URL0, and the target document of an outlink on document d will be referred to as URL1.

For later use in ranking documents in search results, for every link in URL0, which points to some other URLi, the indexing system 110 creates an outlink score for the anchor phrase of that link with respect to URL0, and an inlink score for that anchor phrase with respect to URLi. That is, each link in the document collection has a pair of scores, an outlink score and an inlink score.*

* My Emphasis

The patent goes on to explain how the phrase-based indexing approach might impact anchor weights differently, by whether or not it is a “good phrase,” whether or not it intentionally appears within the document being pointed towards, and whether or not it is a “related” phrase under the definition defined by that patent.

The Microsoft patent filing doesn’t cover the different types of instances described in those different Google patents, but it does try to understand the relationships between sites that point to the same page using the same anchor text, as well as the relationship between pages that link to another page.

In the instance of multiple pages using the same anchor text, and linking to the same page, the two or more pages linking to that page might be examined to see if they tend to link to a lot of the same pages or similar pages. If there’s a large amount of overlap, the weight of the anchor text from those pages might be reduced.

When looking at a single page pointing a link to a page on another site, the level of “dependence” the destination page has upon the linking page might be explored. For example, if the linking site includes links to a large number of pages on the site being linked to, then the relationship between the sites can be seen as a dependent relationship, and the weight of the anchor text in those links might be reduced.

The patent filing does provide more details on how relationship models between sites might be recognized, including links that might have been added by an untrusted third party hidden within a site’s source code.

Conclusion

How much relevance weight does the anchor text from a link provide to the ranking of a page in search results?

It may depend upon a number of factors, including how related or affiliated the search engines might think the sites doing the linking are to the site being linked to are, how related or affiliated sites linking to the destination page might be, how related the text or phrase in the link might be to the page being linked to might be, and possibly some “reasonable surfer” type features associated with the link.

I’ve seen the question raised in a number of places about how much weight, either PageRank or hypertext relevance, might be passed along to a page when there are two or more links on the same page to another page. It’s possible that a search engine might collapse those links together and treat them as a single link, or it might derive different weights for them based upon reasonable surfer type features and then possibly discount their total value.

It’s also possible that the anchor text relevancy of those links could depend upon how related the text within the links might be to the page being pointed towards.

There may be other issues involved as well in how might relevancy anchor text might pass along to another page, but the Microsoft and Google patent filings provide some interesting starting points in an exploration of that topic.

Share

58 thoughts on “How a Search Engine might Weigh the Relevance of Anchor Text Differently”

  1. Bill,

    Absolutely killer post. With the Panda updates now appearing to curb the spam issue somewhat, it looks as if Google is catching up. Interesting that an outlink score is measured and appears to affect rankings considerably now through our testing; high quality pages usually have this trait. Are you noticing similar results?

  2. I like this in concept, but I have recently ran a few tests to see if I could achieve a very high ranking getting exact anchor links from completely garbage pages that are obviously poorly spun articles and have continued to rank out well for competitive terms. I’m fairly certain Rand has basically begged for this on a few different SEOmoz posts over the last year or two…seems the technology’s in place I’m just not sure how effectively (or if) it’s really being implemented.

  3. Bill,

    Glad we have pros like you to keep us up to date.

    The anchor text relevance seems to be adding more variables to just the “get a link from an authority” attitude. The convergence of multiple anchor text link (scores) linking to the same site is very interesting. Makes sense in the other cases, getting links from thematically related, text in body content surrounding the anchor text and some of the neighborhood factors are all good for providing the true relevance signals IMO, that we all should know. But to see documentation speaking of these good practices is cool. I can see how just getting a link from a website with a high DA score may not help your cause which could be good as well has getting low “hood” related links are both good for Panda type cleansing. I can see once the linked web is in full scale, how they might look at existing and matching mark-ups from site to site and page to page. There are now even mark-ups to let search engines know where the important links on your page are. Now the only question is the get an understanding of the specific weights that matter related to anchor text relevance or just do things the right way, look for sites that are really related to yours.

  4. I wonder if sites with cooperative relations would be identified by having a number of similar anchor texts, or is Google going to look for server data or even provider data?

  5. Fascinating and I’m sure a ton of blog networks are soon to be devalued.

    The mind boggles when you start to think about the number of techniques that might in future be used to establish dependency or at least co-operation between the sites doing the linking out. It could be in some cases that any one technique may only provide a weak signal of co-operation but when combined with other techniques add up to much stronger evidence. For example, if all the sites used similar anchor text AND all the links materialised within a short timeframe AND test 3, test 4 etc.

  6. Great post Bill, it’s given me some food for thought. Sometimes I wish link building results were a bit easier to manage!

  7. Hi,

    So this means that Off-Page SEO techniques like link exchanges and other types of cooperative relationships wont have the same affect anymore? Not that I thought that they were carrying much anchor text weight today either but with these changes I guess such link building techniques will be almost useless.

  8. Hi Dave,

    Thank you. The phrase-based indexing patents are definitely highly recommended reading for anyone who might be interested in how Google may be evaluating the value of anchor text relevance, and the phrase co-occurrence reranking and spam identification approaches that they bring to Google.

    While Panda and phrase-based indexing are separate things, I suspect that if you explore the concepts and ideas behind phrase-based indexing and work to implement some of those into sites, chances are good that you are going to moving towards having higher quality sites.

  9. Hi larsoze,

    It’s really hard to perform the kinds of tests that you describe, and truly measure the value of the things that you implement.

    Anchor text pointed to pages should ideally help the page being linked to rank higher for the terms used, but there are a lot of potential variables in how much weight they may pass along, which is what this Microsoft patent describes. Chances are that linking using both exact phrases and related phrases as well might provide even more benefits in the right circumstances. The difficulty sometimes is understanding those circumstances. :)

  10. Hi Scott,

    Thanks.

    But to see documentation speaking of these good practices is cool.

    When I look through the titles of newly granted or published patents, sometimes I’m underwhelmed by what they appear to contain. But every so often, as I start reading one, it starts describing something that I may have taken for granted in some way, but never really seen the kind of documentation that you describe. And it is pretty cool when I see one of those.

    I don’t think that it’s necessary to get links from sites that are substantially related to yours to benefit from them, and a variety of sites linking to you probably doesn’t hurt and could help a bit. It does seem like if the specific page that a link appears upon is at least somehow related to the anchor text used, and to the page being linked to, that’s not a bad situation at all.

    I’m a big fan of trying to create content that fills a need, is engaging, and of enough value that people who are interested in it might link to it, refer it to friends, bookmark it, print it, and share it with friends.

  11. Hi Jan-Willem,

    I don’t know that I would say that link wheels are of anymore value than simple a-b-c linking approaches. The search engines do spend a lot of time trying to understand the links between pages, and how those pages might be related to each others, whether or not pages appear to have been created to provide something of value versus having been created solely or mostly to attempt to manipulate PageRank or hypertext relevance.

  12. Hi Patrick,

    Lessening the amount of hypertext relevance between sites that might be related in some way in this patent seems to be mostly (but not completely) aimed at a search engine not giving too much benefit or credit for anchor text and links that might be reasonable, such as a collection of news media sites from the same publisher pointing at each other, or some mirror sites that might link to the same pages.

    It’s possible that Google may try a few different approaches to get a sense of whether certain sites are affiliated in some manner. For example, in my link above to my post about Google’s affiliated pages patent, one of the things mentioned that Google might explore is if there are certain sites that tend to get visited frequently by the same searchers in the same search session.

  13. Hi Ewan,

    It is possible that an approach like this might be able to identify blog networks created for linking purposes based upon their shared links to other sites. When it comes to trying to prevent that kind of behavior, the search engines often aim at making it harder for individuals to manipulate search results by creating much more work for them.

    Good points on how multiple signals might be telling when they are looked at together. The mind does boggle.

  14. Hi Jenni,

    The better, more useful, more interesting and helpful the content you create at the front end, usually the easier it is to acquire links.

  15. Hi William,

    The patents that I’ve pointed to in my post definitely target that kind of behavior, and we know that the search engines are spending at least some of their time and energy coming up with new ways to identify link exchanges and cooperative arrangements. They do try to identify approaches that can impact as many sites as possible with as few false positives as they can.

    As I’ve mentioned in at least one comment above, in many cases the kind of cooperative linking they are concerned about are reasonable and legitimate links, such as a publisher who has a number of somewhat related sites that link to each other. The focus isn’t on stopping those types of links as much as it is in not giving them too much credit, so that they become more valuable than links from other unaffiliated sites.

    With an approach like that described in the Microsoft patent, unaffiliated links will carry more value.

  16. So does that mean that search engines are now not dependent on the anchor text only for the evaluation of links and the relevancy of that link? I mean what could be the other possible factors the search engines could be looking for while analyzing a link? As far as I can think is that search engines value the text on the page of the link and near it.

  17. It’s possible that a search engine might collapse those links together and treat them as a single link

    If two links are targeting the same URL, only the anchor text used in the first link is counted by Google. Having said that I’ve also seen evidence that adding a hash tag to the link works too (but as yet I’ve not tested this) for example;

    /blog/example-post#expos

    Nice post thanks, lee

  18. Hi Lee

    I know that some experimentation took place on the topic of two links targeting the same URL, by SEOmoz and some other sites, Matt Cutts was asked his opinion, and stated that “if the anchortext is the same, we’ll typically drop the second link.” See: http://www.linkspiel.com/2008/07/mattcutts-bat-phone/

    All of the experiments that I saw described on this subject ignored the possibility of things like phrase-based indexing and I don’t consider any of them valid.

  19. Hi Akash,

    There are a number of things that the search engines could look at when determining whether or not the relevance value of anchor might be passed along, such as the relationship between multiple sites that point to the same URL or the relationship between the source site and the target page, as described in the Microsoft patent filing I’ve written about above.

    Regarding phrase based indexing, if the terms in the anchor text appear on the page being pointed to, that counts for a lot. If the terms in the anchor text are “related” (as defined by the phrase based indexing patents), that also can give the anchor text a higher value.

    If the link is in boilerplate on a page, the value of the anchor text might be minimalized as well.

  20. Hi Bill,

    So, doesn’t that make the Title and description tags more important for the web pages if the search engines are not depending too much on the anchor text these days, because after all this is what defines what the page is all about (besides the content itself). So, does it add more value to the meta tags??

  21. Although very interesting, I’m not sure how much thought should be given to these patents or indeed how search engines might weigh anchor text differently. You point out in your reply to Jenni that “The better, more useful, more interesting and helpful the content you create at the front end, usually the easier it is to acquire links.” Doesn’t this fact mean that there is less need for us to consider ‘How a Search Engine might Weigh the Relevance of Anchor Text Differently’ I bet most of the quality links to pages on your site are there as a result of the great quality content you provide; not as a result of you yourself having mastered how to satisfy Google’s thirst for relevant anchor text from unconnected sites that contain content about a similar topic.

  22. @ Brian

    A clue to how much thought should be given to these patents might lie in the number of returning visitors to this blog.

    I find it interesting, useful and am pleased Bill is doing all the heavy lifting for us!

  23. I definitely believe that anchor text plays an important role in analyzing which site would be the best to sit on front page of Google, still the site that gives the links should be considered as well. Is it relevant? is it an authority website? is it not a bad neighbor? and etc.

  24. @ Brian

    the bottom line is you will rarely know exactly how the various link metrics work and even if you did, you wouldn’t know for long as the algorithms change too frequently.

    In relation to your comment “I’m not sure how much thought should be given to these patents” – Thought is something people engaged in on-line marketing do a lot, it’s our way of accessing inspiration.

    @ Ewan

    Couldn’t agree more :)

  25. @Ewan “A clue to how much thought should be given to these patents might lie in the number of returning visitors to this blog” Get real! As I spelled out in my previous comment, people return here to get good quality content. Serious SEO experts may use Bill’s interpretation of white papers and patents to improve their knowledge and tweak their services, but if you wanted to implement everything found on this site you would need a whole army of people.
    @Lee I guess you missed my point! You sound very, very good at thinking, and I’m glad all the thinking helps you access inspiration; perhaps you need to focus more on your reading technique. I’ve been reading Bill’s research for the best part of a year and picked up so much useful stuff along the way. That’s why I love SEO by the Sea, why he’s a respected expert in his field and why I’m here now! As you’re such an advocate of thinking, perhaps you should think about whether or not quality content helps to create useful links with suitable anchor text, proximity of related phrases and from sites that themselves have quality content (Which was my point).

  26. @Brian

    apologies, was a little judgemental of me :(

    Like your sarcasm by the way, very good, especially enjoyed the part about my reading technique, lol the wife agrees btw

  27. @Lee thanks and my apologies too! sarcasm is said to be the lowest form of whit, so nothing to be proud of here. Wow Bill is even bringing people together, powerful stuff!

  28. @ Brian

    Thanks for your reply. I think the meaning I inferred from your first comment may have differed slightly from the one you intended, so possibly a case of crossed wires. Thanks for the clarification.

  29. Great read Bill,

    I have always been curious to test this out as much as I can. Over my time building links, I have never noticed a difference in the rankings when I use non-related anchor text. Obviously I still use the related anchor text because I am a firm believer that you can drive numerous amounts of traffic by having related anchor text. Some people spend too much time on back links and never really utilize the traffic coming to those sites that they have linked to as well. Might as well kill two birds with one stone, right?

    John

  30. @Ewan Cheers! Let’s all get back to reading Bill’s great work. (Thanks again Bill for all the heavy lifting you do for us).

  31. Hi Akash,

    Titles for pages seem to be one of the more important signals that search engines may look at, though unfortunately there are sites where the site owners don’t use titles for pages that are very descriptive of the content on the pages being titled.

    We’ve been told over and over by the people at Google that they don’t use content found in meta descriptions and meta keywords as ranking signals, and I’ve written enough of both over the years, and watched the impact or lack thereof of both to believe that. Regardless of how the search engines might value anchor text in different circumstances, I don’t think that’s going to change.

  32. Hi Brian,

    I think an awareness of how search engines might weigh and measure the anchor text if finds pointing to your pages can be helpful in a number of ways, and I think looking at these patents can give us an idea of how search engineers might feel about anchor text. We know that they still value it very much, but they are trying to learn how to use it better as a ranking signal, and how to reign it in in some circumstances, or avoid situations where it might be manipulated or abused.

    I do like to follow and recommend a linkbuilding approach of providing what I think might be useful and engaging enough information that people might find interesting enough to talk about, link to, and refer others to. I do know that the words I use within the title to blog posts, and in the content I create might influence the anchor text that people might use to point to a specific page or post. I’m not advocating that one shouldn’t acquire links from other sites, but rather stating that it can often be easier to acquire those links if you do things like help solve other people’s problems, or give them something interesting that they might want to share, or present them with a resource that they can build upon.

  33. Hi Ewan, Lee, and Brian

    Thanks. I appreciate your visits, and discussion here. I think it’s healthy to question some of the patents that I write about – it might take an army to implement some of the things that they point towards, but I do think we’re better off knowing about them, and thinking critically about what they might imply. If you see something in a patent that I write about that you don’t like or question, I’m happy to hear it. :)

  34. Hi Matt,

    Those are all good questions, and I think part of the aim of the Microsoft patent is to get a better sense of the sites that are doing the linking.

    So for instance, if you have two different sites that are pointing links at a page on a third site, and those two sites tend to link to a lot of the same pages, it’s a little like a warning signal to the search engine that they might be up to something. In that case,the anchor text from those two site might not count as highly.

    What we don’t see in this patent is that the search engine might look at other things related to those two sites as well. For example, they might be mirror sites, in which case the search engine might decide to index one of them, and not the other because they would be providing substantially duplicate content. This patent focuses pretty much upon how much weight to give anchor text from the sites, and it’s focus is pretty narrow, but the kinds of things that you mention may also be things that the search engine is doing as well.

  35. Hi John,

    I like to try to use related text in the anchor text I use when I link as well, but sometimes when you’re linking the best place to do so in your content might not always involve the most descriptive text.

    Though I do have to say that it can be really difficult in many instances to understand how much weight or relevance your link might impart to the page that you’re pointing towards. I’ve seen some instances where I know that there’s a single link to a page using specific anchor text and it has made a difference, but often there are so many other things going on that it’s hard to tell.

  36. Exactly that’s what I thought, title and description tags have become so important these days than ever before, still many people do not utilize it properly.

    BTW, enjoyed reading the conversation here.

  37. Good morning, after going out on my own from a rather boring and stuffy position in an agency I wanted to be more creative.. as a result I am now at home with my new 27inch imac wanting to design website for small local business.. but everyone from plumbers to a executive search client now wants SEO.. something I am aware of but not familiar..

    Whats wrong with having a great site and great content.. apparently thats not enough so here i am furiously researching acres and acres of contradicting information on the nest way to massage google.

    In point of fact Im not enjoying this process but needs must.. I am glad to be here as it seems rather more erudite than the sites offering to seel me 5,000.000.000 links for $9.99 for INSTANT results.

    Its a struggle but I will now load up the coffee machine and get to the bottom of this site to see if i can finally find out how to add value to my clients without sub contracting to some SEO’s who in my mind are one octave higher than estate agents..

    How can i determine what makes a good SEO partner and also is there no room for morality.. do i have to become a monster in order to rank well?

    Thanks,

    Luke

  38. Hi Akash,

    Unfortunately, people don’t always do the greatest job of using titles and meta descriptions, and often don’t offer unique ones for all of their pges.

    Since the text in meta descriptions really isn’t used for ranking pages, it’s most important role is when people see it as a snippet describing a page in a search result, since it can influence people to visit the page it describes.

    Thanks.

  39. Hi Luke,

    Your questions are pretty far from the scope of this post. Maybe another one I wrote might help. See: Good SEO

    I think that post may provide answers for most of the questions in your comments.

  40. Bill, does your brain ever hurt from absorbing all the info in these patents? I am glad you do it since you do a great job of distilling the info for those of us who are to lazy or busy to do the digging ourselves.
    I particularly like the idea that the relationships between sites is carrying more weight as I have long been annoyed by the practice of using multiple sites are little more than complex landing or doorway pages. And of course anchor text spam is pretty annoying too.
    Thanks again for sharing a wealth of information.

  41. Pingback: The Complete Google Panda Reference Guide | The Milwaukee SEO
  42. Hi Nick,

    There’s so much to learn, and while it can be a pain digging through the legal language of the patents, it’s definitely worth doing.

    This patent did do a good job of exploring the idea that the analysis of links is a lot more complicated than simply things like whether or not a link on a page is the first link (of possibly more than one) to another page. I suspect that if we actually took some time to come up with other factors now that a search engine might look at when deciding how much “relevance” to pass along via anchor text in a link, we could have a pretty big list.

    It makes sense for them to look at the relationships between pages, especially as you point out, people create sites that are in essence landing pages or doorway pages for other sites. But even if they aren’t, it’s not usual for people who own multiple websites to link out to their other sites as well, and this approach from Microsoft tries to impose some limits on the value of links between those sites. That seems to make some sense to do.

  43. Hi Bill,

    I think also position on the web page that includes the link counts, so a link in the footer or in the blog roll is not as important as in the middle of the page.

  44. Great post that will have us rethinking about how related pages and dependant sites should link to each other. Many tweaks to be done after reading this post, thanks!

  45. Hi Alex,

    Good point. The Reasonable Surfer patent described that within the context of how much PageRank might be passed along by a link, and I think it’s also possible that may apply to anchor text relevance as well. That patent described a number of features that might be looked at together when deciding how likely it might be that someone might click upon a link found on a page, and I think some similar analysis might take place beyond just where the link is located on a page.

  46. Hi Eliseo,

    There’s been a lot of discussion across the Web about things like the value of sitewide links on a page, and how they may be diminished in value from both a pagerank and anchor text stance, but not as much when it comes to the relationships between pages and the possible impact of those relationships when it comes to how much hypertext relevance is passed along. I was happy to find this patent and its discussion of that topic.

  47. We have always known anchor text matters, but it seems to be mattering less these days, not more. We still pay attention to it, but we look more at the context on the rest of the page more than the specific anchor text.

  48. Thanks for the reply Bill.
    Definitely this topic and ensuing discussion is fresh and valuable to all SEOs. Especially getting this info from your site which we trust and rely on.
    I have relayed the info on our site, in three languages, goes to say how much we think this post gives food for thought and gives some concrete info on matters we have all speculated on.
    Some comments above talk about the probable de-indexing of blog networks when these patents are really implemented. It does seem that private networks that may not have a huge variety of outgoing links (in terms of domains receiving the outbound links), could be easily detected. I wonder if the huge paying networks like Linkvxyz and Linkvabc are still a safe bet? They do seem to have the necessary variety of outgoing links/anchor texts/receiving pages+domains to make them impossible to identify as related or dependant, don’t you think?
    Wishing all readers a great week-end!

  49. Hi Steve,

    It’s hard to tell the exact value of anchor text because there are so many other potential factors that can go into rankings for a page, but I think it’s worth thinking about and considering when doing optimization. I agree that you want to work on more things than just links and anchor text though.

  50. Hi Eliseo,

    You’re welcome.

    I do have to say that I haven’t used blog networks, and I’m while I’ve seen a few. I’m not very familiar with them – there are lots of ways to attract and acquire links, and I tend to avoid ones that have an element of risk to them that I don’t want to rely upon.

  51. Pingback: Anonymous

Comments are closed.