Last Friday, in a well-received and thoughtful White Board Friday at SEOmoz titled
Prediction: Anchor Text is Dying…And Will Be Replaced by Co-citation (title changed at SEOmoz) Prediction: Anchor Text is Weakening…And May Be Replaced by Co-Occurrence, Rand Fishkin described how some unusual Search Results caused him to question how Google was ranking some results.
I’m a big fan of looking at and trying to analyze and understand search results for specific queries, especially when they include results that appear somewhat puzzling, and I think those provide some great fodder for discussions about how Google might be ranking some search results. Thanks, Rand.
In Rand’s presentation, he pointed out that:
- Consumerreports.com ranks well for the term “cell phone ratings,” without using the terms “cell phone” or “ratings,” or uses some of those words, but not in the page title, or in other prominent places on the page.
- Thomasnet.com ranks very well for the term “manufacturing directory,” without using those words, and without appearing to try to rank for them, based upon the content that appears on the page.
- SEOmoz’s Open Site Explorer also ranks very well for the term “backlink analysis,” again without using those words (except maybe in alt text on the page, and in the past as part of the title for the page).
While these are all terms that those pages might be assumed to rank for because they are pretty descriptive of what those sites do, It doesn’t appear that any of the pages were purposefully optimized for those terms with the content of those pages.
The Death of Anchor Text?
We do know that many pages tend to rank well for some query terms because of links to those pages that include anchor text for those terms, even if the terms don’t appear on the pages, and this has been part of how Google has operated for years.
But I’m looking at the search results that Rand mentioned, and the pages do seem somewhat (ok, poorly) optimized for those terms.
In the snippet for Open Site Explorer, we do see “backlink analysis” appear highlighted, and that is taken from alt text for an image on the page.
In the snippet for Consumerreports, on a search for [cell phone ratings] I’m seeing “cell phones” and “reviews” highlighted in the Title for the page (which is http://www.consumerreports.org/cro/cell-phones-services.htm – Google shows the breadcrumb for that page in its search result rather than its URL), and shows the following as a snippet for the page which also shows those terms and a synonym for one of the terms as highlighted as well:
In the snippet for the Thomasnet homepage on a query of [manufacturing directory], we also see some terms from Rand’s query highlighted in the snippet for the page, including the word “business” which Google might be using in this context as a synonym for “manufacturing”:
The term “directory” doesn’t appear on the page, or within its HTML code, but if Google hasn’t categorized the site as a directory at some point in the past, it’s time for Google to start over. The site is a manufacturing directory. Google has published a patent on reranking search results when a category for a query and a category for a page matches, and can boost a result in search results based upon their matching (more on that below).
But let’s ignore the fact that these terms and/or synonyms for them do show up on the pages in some ways.
Regardless of whether pages are optimized on-page for certain terms, the relevance for hypertext using certain terms can also help pages rank for those terms, even if the terms don’t appear upon the pages themselves. It used to be that when you looked at the cached copy of a page in a search, it would tell you whether or not the page ranked for your query terms if the terms didn’t appear on the page. None of the cached versions of these pages tell us that, but it’s possible that Google no longer shares that information with us.
For example, the page to download the Adobe Reader has ranked #1 for years for the term [click here] without having the phrase on the page (it does have the word “clicking,” but there are a lot of links on the Web pointing to the page that uses the anchor text “click here”. It’s presently at # 2 right now, but that’s still pretty impressive.
The lack of Rand’s query terms on these pages in substantial ways and the fact that they rank for the terms seems to be an argument that Anchor text relevance potentially could be as strong and powerful as ever, especially if there are a lot of links to those pages that use that anchor text.
But let’s say that there aren’t, even though it wouldn’t be surprising at all if Thomasnet has some nice links to it from some high-quality pages using the link text “manufacturing directory,” or the Consumer Reports page on Cell phone services was linked to with anchor text using the phrase “cell phone ratings,” or that the Open Site Explorer was linked to with the anchor text “backlink analysis.”
Rand’s presentation doesn’t tell us whether or not he did some kind of backlink analysis (using OSE may be) to see how many links are pointed at consumer reports or Thomasnet or the Open Site Explorer that might have included those terms in anchor text pointing to the pages. It would have helped answer the question in Rand’s presentation about whether anchor text is dead or not, but it might have helped change how we believe anchor text works if there weren’t a lot of links using those phrases pointing to those pages as well.
So, let us ignore the use of some of the terms in the queries on the pages (especially since those pages are mostly poorly optimized for the terms), and let’s ignore whether or not anchor text pointing to those pages might influence how they rank.
Co-Citation or Co-Occurrence?
Before we dig further into those though, the term Rand used to describe this phenomenon (co-citation) stirred up some cognitive dissonance in me. Things that co-cite cite things together. Here he’s not describing citations but rather whether or not terms tend to appear on the same pages. Citations don’t necessarily have to be linked, and if you search through Google scholar, you’ll see lots of scientific documents that contain lots of footnotes and citations to other documents within them. PageRank could be easily said to be based upon those types of academic citations, and it has been many times.
Jim Boykin called Co-Citation a potential ranking factor back in 2006, in his post Co Citation â€“ understanding how it affects your SEO, but he was talking about a very different concept, which has its roots in how different pages with similar content might be cited by third party sites, and the more frequently that kind of co-citation happens, the more similar the pages being linked to might be considered to be.
Rand does tell us that he has noticed that the ranking pages he pointed out, and the terms they ranked well for tend to co-occur frequently on the same pages.
It does seem that Rand is referring to co-occurrence, and his mention of how words do co-occur on many of the same pages made me think of Google’s Phrase-Based Indexing approach, which I’ll discuss in more detail a little later. Co-occurrence is an important part of Phrase-Based Indexing, and how “related” some terms might be can be based upon it. Perhaps Rand meant to use co-occurrence?
Under the Phrase-Based Indexing patents, the anchor text that uses related terms might also carry more or less weight based upon the strength of those relationships. Link to a page about sailboat rudders with the anchor text “doggie treats,” and the hypertext relevance of that link might not be the same as if you used something like “sailboats,” or “rudders” (The Phrase-Based Indexing patents are the only ones I’ve seen from Google that describe how they can be used to overcome “Google Bombing.”)
So let’s take a quick peek at some reranking approaches that could cause pages to rank well for a query term even though on their faces they might not seem as relevant for a term.
Reranking Algorithms on Top of Traditional Ranking Approaches
The examples that Rand provided do seem to go against a traditional ranking approach that looks at words and content used on a page to create an information retrieval ranking score when combined with a link analysis approach like PageRank, that can be used as an importance score for a page.
Many of the ranks of pages for a query are often based upon such a combination. I have written several posts though that describes how these original rankings might then be influenced by a re-ranking approach that can boost some pages in rankings, and reduce the rankings of other pages.
Here are a few of those:
- 20 Ways Search Engines May Rerank Search Results
- 20 More Ways that Search Engines May Rerank Search Results
- Another 10 Ways Search Engines May Rerank Search Results
I’ve written many additional posts about methods described in patents and papers that cover other reranking approaches as well. It’s probably time for one or more in this reranking series covering things like results for recency sensitive queries, results influenced by social signals, rerankings where if categories for queries and categories for web pages match then those pages might be boosted in results, and others as well.
Localized Organic Results as a Reranking Approach
Google’s localized organic search results that may insert local results for certain competitive query terms into the first page of search results, like for dentists or hospice, is another example of a reranking approach in use. Those local pages may not be as relevant and/or as important (based upon PageRank in part or full) as the other results surrounding them, but they’ve been boosted in rankings because they are relevant for the locations of people performing those searches – even though the queries don’t include geographic terms.
In a search for [hospital] from my location in Virginia, Google shows local maps results first, an entry from Wikipedia, and then several results showing hospitals around me. These hospital pages likely don’t rank as highly as other pages for the query [hospital], but since they are nearby and are likely organically ranked well for the term, they have been boosted in search results via a reranking approach. There is a hospital within my town that isn’t there, but the localized organic algorithm isn’t the same one that powers Google’s local search.
Other Reranking Approaches
Phrase-Based Indexing and Anchor Text Weight
Under Phrase-Based Indexing, not all anchor text carries an equal relevance weight, and Google’s phrase-based indexing patents describe how anchor text might be weighted based upon whether or not links use “related” phrases, which are found in documents where phrases returned in response to a particular query tend to co-occur. The more related the phrase, the more hypertext relevance it might pass along.
See Google’s patent Phrase-based indexing in an information retrieval system, and the section in the description with the heading “b) Ranking Documents Based on Anchor Phrases.”
Reasonable Surfer Model
The reasonable surfer model also shows how different links might be weighted, based upon a combination of several features associated with those links, including the location on pages (a higher weight to main content links than blog comments, as one of the features, considered), how they are presented (font size, color, style), relevance to the content on the page they are located upon and on the page targeted by the link.
I wrote about the patent in the post, Google’s Reasonable Surfer: How the Value of a Link May Differ Based upon Link and Document Features and User Data, and the patent is at Ranking documents based on user behavior and/or feature data
The patent doesn’t mention the word PageRank at all, though the inventor’s reference to a “reasonable surfer model” is a play upon the “random surfer model” that Lawrence page used to describe how PageRank works. Instead of “PageRank,” the patent refers to the amount of ranking “weight” that a link might pass along from one page to another, and it includes some examples of how the anchor text used could influence that weight:
The patent includes many features associated with a link, and I’ve included a handful that involves which text is used within a link, and how they might be analyzed.
- Actual words in the anchor text associated with the link
- Commerciality of the anchor text associated with the link
- A topical cluster with which the anchor text of the link is associated
- A degree to which a topical cluster associated with the source document matches a topical cluster associated with anchor text of a link
Under the Reasonable Surfer model, the relevance value of a link possibly might be increased or decreased based upon not only the text used in a link, but also how relevant that text might be to the text of the page it appears upon, and the text of the page that the link targets. It might also be increased or decreased in value based upon how and where it’s displayed on a page.
Is the weight discussed only about PageRank, or is it also about hypertext relevancy? Given under this approach that the actual text with links, and the text on both the source and target pages matter, its possible that both are impacted.
Reranking Based on Categories
Google may assign categories to web pages, and categories to query terms. When search results for a query that has been given a category are returned, web pages that fit that category might be boosted in search results. Chances are good that the query [manufacturing directory] is in a “directory” category, and maybe even in a “business directory” category. Chances are also good that the home page of Thomasnet.com is also in the same categories.
If this reranking based on categories algorithm is being used, then the Thomasnet page could have been boosted in search results. See: How Google May Use Categories as a Search Ranking Factor
There’s likely also some kind of entity association or category association or both involving the brands involved (one type of entity) and the terms used. This is another factor that could easily boost those particular pages in search results, regardless of the Information Retrieval score and PageRank score. See 10 Most Important SEO Patents: Part 6 – Named Entity Detection in Queries for some discussion on one approach that Google uses to rerank search results based upon entity associations.
ConsumerReports.org, Thomasnet.com, and the Open Site Explorer could all be considered entities, and the query terms in Rand’s presentation are all terms that could be considered by Google’s entity approach to be associated with those specific entities.
Synonyms and Rankings
Google has several patents involving synonyms and meeting informational needs that describe how a page that ranks highly for one term might also rank highly for terms that are synonyms or that fulfill an equivalent informational need. So we see in the snippet above for Thomasnet.com, that on a search for [manufacturing directory], the word “business” is highlighted.
Google has told us in the past that when a synonym is used to determine the relevance of a result, that if it appears in the page title or snippet shown to searchers within a search result, that they would highlight the term. See: Helping computers understand language
In that post, we are told:
Historically, we have bolded synonyms such as stemming variants â€” like the word “picture” for a search with the word “pictures.” Now, we’ve extended this to words that our algorithms very confidently think mean the same thing, even if they are spelled nothing like the original term. This helps you to understand why that result is shown, especially if it doesn’t contain your original search term. In our [pictures developed with coffee] example, you can see that the first result has the word “photos” bolded in the title:
Since the word “business” was bolded in the Thomasnet snippet on a search for [manufacturing directory], it seems that the page was likely partially determined to be relevant for that term based upon the inclusion of the word “business.” on the page.
Given both the Reasonable Surfer patent and the Phrase-Based indexing patents, it’s possible that hypertext relevance isn’t necessarily weighted the way most people think that it is.
Not every link using the specific anchor text carries the same weight, and a link with anchor text that is “related” to the query a page is found for might carry more weight with to help rank for that term than other anchor text that isn’t related.
The phrase “business” co-occurs on very many highly-ranked pages in response to a query for [manufacturing directory], so those could be considered very related terms. Given that, if links are pointing to a site like Thomasnet.com that contain the word “business” or even “business directory,” under phrase-based indexing, those links are votes for the page being relevant for the query [manufacturing directory].
As I mentioned above, Google has published a large number of reranking approaches, and I probably could have covered several others that might have played a role in where the pages Rand pointed to were ranking.
Rand’s point about co-citations does really seem to be about co-occurrence, and how frequently terms tend to co-occur on the same pages (a link is a citation because it cites other pages, and Rand’s point is that these terms tend to appear together on pages without necessarily being links).
I don’t think that anchor text is dead, or that Rand’s presentation proved that. But maybe it’s a little different than most people might believe?
Thanks for starting this discussion, Rand. Trying to understand odd search results is always fascinating.
Happy Thanksgiving, everyone.
99 thoughts on “Not All Anchor Text is Equal and other Co-Citation Observations”
Thank you for this post Bill. I now have a better idea as to why this one site I’m monitoring is ranking higher for a term that was used in a blog title ONCE, than keywords that are more relevant than said blog title term. But I also think that the structure/style/tone of the snippet of content, tag, description, etc. may also responsible for its ‘weight’ in most cases.
If you were to search, for example, ‘watermelon martini’ you will get results with recipes of course, and an article or two about the joys of summer with your girlfriends. Look at a few more results down, and you also start to see ‘watermelon martini’ being beneficial to your health, packed with anti-oxidants, and it being the best kept diet secret. So how is it that a blogger who wants to live skinny in Iowa, is outranking the leading vodka brand by 3 pages when the are both have ‘watermelon martini’?
I have been reading this blog for 4 years, so should know better – but don’t.
Bill, please can you give a real basic 101 to what co-occurance is.
Do we mean that it is helpful to a ranking to have related terms, equal to the semantic of an anchor text?
Can you explain in simple terms?
Excellent post and very very important post. Thanks so much for all this nice and free content.I can already imagine strategies after reading this article.
Link anchor text works just fine. Google may have changed the way it evaluates link anchor text many times through the years, but what we have seen this year is that Google has vastly improved the way it decides which links will pass value (and perhaps that may include deciding how much value each link should pass).
The old SEO link building strategies are faltering not because Google no longer trusts link anchor text but because Google is simply better at distrusting unearned link anchor text.
The evolution of Google’s filters and weightings appears to me to be designed to make their original idea (that links are recommendations) work when used with appropriate filtration (that is, by filtering out non-editorially bestowed links).
As much as I have criticized the original PageRank algorithm’s flawed premises, Google’s many layers of qualification have truly refined the premises. Now they are not saying that all links are recommendations and therefore should help in determining search relevance and importance — they are saying that search relevance and importance should be influenced by editorially-bestowed links that are intended to act as recommendations or at least as relevant referrals.
It’s a much better premise that makes it harder for manipulative links to change the mix. Manipulative links can still work but they have to work harder.
I’ve had mini-debates a couple times recently over the use of the phrase co-citation to describe this concept.
In my mind, it’s not traditional co-citation, since that’s focused on establishing a relationship between two webpages that are linked from a third (or more) webpages.
At the same time, I don’t think co-occurrence is a perfect term either, since that is more about establishing a relationship between two or more terms or phrases.
The concept Rand was referring to is about establishing a relationship between a link and the terms or phrases that appear near it, without being used in the anchor text. To me, it’s a 50/50 blend of co-citation and co-occurrence, and neither is a perfect descriptor.
So, I went searching for better terminology to use.
The first thing I came across was the term “Conceptual Links” in the SEOBook Glossary: http://www.seobook.com/glossary/#conceptual-links – the definition seems to describe the concept fairly well, but in a broader sense. I asked Aaron Wall where that phrasing came from but after tweeting back and forth with him I didn’t really establish a phrase for what I was referring to and the exchange drifted from the topic.
Around the same time, I was pointed in the direction of the original PageRank/Backrub paper at http://infolab.stanford.edu/~backrub/google.html – If you scroll down to section 6.1 (Future Work), it says “As for link text, we are experimenting with using text surrounding links in addition to the link text itself,” which establishes that this isn’t a new concept by any means. But, they don’t give a term for it and that’s the only thing I’ve found in any Google papers that mentions it (not that I’ve crawled through them all, by any means).
Finally, the only other published mention of this concept I’ve seen by a SEO comes from Ross Hudgens, in this comment from him on a blog post that he wrote (scroll up to read the post first, then his comment): http://www.rosshudgens.com/brand-anchor-text-a-link-building-hypothetical/#comment-20370 – Ross also uses the phrase co-citation.
All of that is a long-winded way of saying – I don’t think the phrase is incorrect, but it’s not perfect either, and I don’t have a better suggestion.
None of this is meant to refute or support the examples used by Rand. There was also some discussion on the inbound.org thread about those examples and what other factors were affecting them. Even if they were poor examples, I don’t think it strikes down the concept. Unfortunately, I think starting things off with “anchor text is dead” made the video more divisive than it should have and detracted from a discussion of the concept itself.
For what it’s worth, I do believe that brands (as entities) can establish a semantic relationship through co-occurrence that increases their relevance to unbranded keywords. Separately, and often at the same time, I think Google certainly looks at phrases within a certain proximity of links, and perhaps applies them the same way they apply anchor text, albeit at a lesser weight. I could also see them weighting this factor heavier in a situation where they decide to not use the anchor text, or in a situation where the anchor text is branded.
The place I’ve seen co-occurence used most prominently is in the phrase-based indexing patents from Google. though Google uses other algorithms that look at co-occurrence, too.
There are a couple of steps to phrase based indexing and how it uses co-occurrence..
1. When Google crawls the web, it identifies “good” phrases that appear on pages. The phrase “President of the” is not a good phrase, because it’s incomplete. “President of the United States” is a good phrase because it covers a complete concept. A single word could be considered a phrase. Idioms that appear on pages, like “top of the morning,” or “quiet as a mouse,” aren’t going to be considered good phrases because they really could be used on any site and not add much meaning.
2. When there is a Google search for query, Google might look at the top ten, or the top 100 (or the top 1,000) results to see what good phrases appear on those pages.
3. Phrases that tend to reappear on more than one page in that set of search results are said to “co-occur,” for the query.
4. Sometimes you have queries that have more than one meaning, like “java,” which can mean the programming language, coffee, and an island in the Phillipines. Because of that, Google might cluster together the different documents in the search results that are similar in the words they use, and have different sets of co-occurring good phrases for them.
5. Phrases that co-occur across multiple pages in the set of results (or the clusters when there are multiple meanings) are considered related phrases to the query term.
6. Pages that tend to contain more of the co-occurring terms can be boosted in search results (at least up to a point – if the pages contain too many of the co-occuring terms based on a statistical analysis, they might be considered spam pages).
7. Links that point to those pages that use the related phrases within those query sets (or clusters) pass along more hypertext value than links that don’t use related phrases.
So co-occurrence is referring to whether good phrases appear in multple pages among the same search result set (or cluster of pages within that set). The fact that multiple good phrases co-occur on individual pages means that those pages are likely more relevant for those queries as well.
Brilliant observations Bill. I can see a lot of hard work here
Thanks for the post
I wasn’t really sure that Rand provided a good definition of what he considered co-citation to be, and I was thinking of that traditional meaning that you refer to as “co-citation.” That’s why I included a link to Jim Boykin’s blog post from 2006.
It really wasn’t clear from the video that Rand was referring to terms that appear near links. That is one of the features that the Reasonable Surfer patent looks at as well when deciding how might weight to give to a link (from the patent another one of the features is listed as, ” the context of a few words before and/or after the link;”). So the occurrence of certain terms near links could potentially influence how much weight is passed through those links.
The term “conceptual links,” seems to fit in with the links (or relationship types) between conceptual nodes described in the Google patents from the Applied Semantics team after they started working for Google.
The PageRank paper’s mention of test surrounding links sounds like the feature I mentioned above in relation to the Reasonable surfer patent.
In the comment from Ross, he does mention co-citation a couple of times, though he also uses the word “co-occur” in a way which seems to treat the two as equivalent. Not sure what to make of that.
Google’s PHIL (Probabilistic Hierarchical Inferential Learner) algorithm that attempts to classify content on pages for purposes of what advertisements to show on those pages via Adsense also refers to term co-occurrence as an approach to do that as well. See: Google’s Second Most Important Algorithm? Before Google’s Panda, there was Phil
I agree that starting the post off with “anchor text is dead” had a linkbait edge to it, and wasn’t really something that the post really brushed up against much, and didn’t really even try to prove.
Google’s patents on entity association are worth spending some time looking at in determining how those associations are created. See, for instance, Identifying Query Aspects
It’s really useful to have some ideas of what kind of reranking algorithm might cause some pages to appear in certain search results, especially when you’re doing something like keyword research or content development. 🙂
Very interesting read. I would like to ask you if you could explain the below phrase a bit more. Can’t really make sense of the second sentence.
“It used to be that when you looked at the cached copy of a page in a search, it would tell you whether or not the page ranked for your query terms if the terms didnâ€™t appear on the page.”
Link text seems to be working fine for me, too. I agree completely that Google seems to be more intelligently using different filters and weightings to allow some anchor text to be recommendations, and to ignore others. There still is a lot of value to hypertext relevance once you start doing things like ignoring commercial keyword terms in name fields in blog comments, and other similar places. 🙂
You’re welcome. There’s more than just what appears in a snippet, but I had to point to them in the examples that Rand provided because they did show that those terms or synonyms for them did actually appear in the content of those pages, even if it was in place like the alt text from the Open Site Explorer pages. Those pages definitely weren’t specifically optimize for those terms, though the Consumer Reports appears to have been optimized for a fairly similar phrase.
That [watermelon martini] set of search results are showing some high powered sites ranking first, but it is kind of fun to see the blog posts outranking the vodka brand by that much. I suspect that the vodka brand may have some SEO issues, and a harder time attracting some editorially given and relevant links than the blog posts. Big brands sometimes take SEO for granted when they shouldn’t be, and don’t create great content around the terms they would like to rank for.
Really interesting stuff, I have a client that despite ranking page one for secondary search terms and long tail search terms (3 words + ) simply cannot get remotely close to page one for his main search term (2 words, national search, no geo graphical modifier). I’m going to apply some of the ideas that have formed whilst reading this, certainly regarding the onsite internal linking structure. We have played around with synonyms on another site purely by accident as we were using a mix industry keywords rather than what the research tools were telling us, we had quite a bit of success too.
I guess it all boils down to writing content for the user rather than for Google.
Candice – Are you absolutely sure that the old keyword isn’t in the anchor text of any links going to the page? In my experience, the most likely anchor text you’ll see for a blog post is its name (as it is in the title), especially if you think of bookmarking sites, etc. Could it be the case that the old title is in the inbound anchor text and therefore a partial match to the keyword? Just a thought. 🙂
Bill – Great observations and insights. I did wonder if synonyms might have been playing a part here too (in additional to some partial match inbound anchor text, as I commented about on the post itself).
That is very interesting. So perhaps the key to ranking for a search term, then, would be to analyze search results and determine what phrase co-occurs with that keyword so as to give a page the maximum possible relevancy boost???
Would perhaps the same phenomenon hold true with out-bound links on a page???
I can see the WordPress SEO plugin developers adding a co-occurance-term field to their plugins now and assigning it a quality score.
Hey Bill, I completely agree with you because I have faced the same thing on my website’s one keyword was ranking between 1 to 20 wilth the help of relavancy & after reading I realized that there are so many algorithm is being working & many times I don’t recognized what is happening & how ranking in going up or down.
I was working on web design services & creative web design services was ranking so good on top 15 with-out link building so it’s very much difficult to identified the results from Google.
Can co-ciation works with only relation ships with users or other matters will be effected? Because Google has linked respected names, anchor text & links with the respected domains.
What you guys are thinking on that?
Great overview of a wide variety of info on co-occurrence.
For an interesting tool that people can check out if they want to get a sense of what textual co-occurrence is about, the Latent Semantic Analysis tools at University of Colorado are a lot of fun, they analyze co-occurrence based on various corpuses (collections of documents).
You have to put an extra carriage return in between terms, try for instance entering in their “Matrix Tool” the following terms:
And you’ll see that, as is intuitive, cop is more closely related to policeman than it is to judge and so on (of course, cop is 100% related to itself). But notice that Coroner’s highest relationships are to both doctor and cop, which makes sense. These tools are a good way to get your head around the concept of co-occurrence.
There are a lot of misnomers and myths about Latent Semantic Analysis – some people in the industry mistakenly call keyword density analysis of top SERP results “LSI” which is totally wrong.
Other techniques besides LSA that are related are “Latent Dirichlet Allocation” and “Principal Component Analysis”.
People who say “Google doesn’t use LSA” (or something like it) are IMO wrong; Google itself disclosed in the paper “Predicting Bounce Rates in Sponsored Search” that for that particular exercise:
“The related terms were derived from the parsed terms
using a transformation Ï†(Â·), using a proprietary process similar
to term expansion via latent semantic analysis”
You raise a very interesting point about Google possibly using co-occurence within its analysis of anchor text, that is a really interesting concept.
Anchor text has been losing importance as a signal anyway. Google has got better at understanding the nature of a page. In many of my searches Google seems to give more weight to short-tail keywords in a long tail query rather then finding relevancy for the long tail search in itself.
As for co-citation and citation, it really only works for the homepage of a site and so can never replace anchor text as a relevancy signal for internal pages.
No one seems to have brought up the fact that as social media sites such as Twitter become more popular, the short-link (e.g. bit.ly/abc) is replacing the modern anchor text link. This could make it increasingly hard for Google to use backlinks to find relevancy in search.
Wow way to take a great post and expand on it phenomenally. Co-citation is definitely a factor in rankings, but not a replacement for anchor text. Just how social signals didn’t replace links. They all hit an equilibrium at some point.
Great WBF by Rand (although a bit different seeing him without the facial hair). Anyway, although in the long term anchor text may be devalued – at least comparative to past and present – for the time being it’s still working. Particularly important for brands that incorporate keywords in their name and/or site, it will continue to be relevant.
For text surrounding links, I’ve always preferred “link context.” Seems intuitive. Good discussion here.
Excellent observations Bill and I agree with Kane that the use of text surrounding the anchor text may also be relevant to a search engine.
Synonyms, Phrase Based Indexing, Entities and Category Based Ranking are all probably part of the reason for these results to varying degrees. As Michael states, this doesn’t mean anchor text is going away, it’s just getting smarter.
I think the academic side is covered here so I’m just going to talk about some hacks and observations.
Using Google Sets in Google Sheets the terms ‘cell phone’ and ‘ratings’ produce the term ‘consumer reports’ in the first 10 terms. Whether it’s entity or category based I think it’s clear that the synonyms of reviews and ratings are highly associated with Consumer Reports.
Using Google Sets in Google Sheets the terms ‘manufacturing’ and ‘directory’ produce the term ‘suppliers’ in the first 10 terms. This term is all over the ThomasNet page.
Related searches for ‘backlink analysis’ returns ‘website explorer’ as related (and places OSE first highlighting ‘Site’ and ‘Explorer’. Furthermore, drilling down, ‘backlink explorer’ is a related term to ‘website explorer’. While I think category based reranking might be at play here, this is the one term that might be showing the influence of co-occurrence (or co-location).
It’s also fun to see what Google returns (in bold) when doing a synonym negative search for each query component (~ratings -ratings).
Search, it’s never boring.
Bill, wonderful drill down on this ! I came directly to your blog after seeing Rand’s video (even before I noticed that he gave you a citation :)) — mostly because I didn’t buy the “co-citation” theory as the sole explanation for the 3 examples he gave. Now — knowing the resources SEOMoz has at their fingertips (OSE-type functionality at levels certainly unavailable to the rest of us) I would hesitate to challenge his conclusions. However, something like your “Reranking for Categories” explanation (for me) does a better job of passing the Occam’s Razor test. Clearly these 3 sites have tremendous authority in these areas. Free association with “Manufacturing Directory”, even 20 years ago, with anyone who purchased manufacturing products and services, would have come up with “Thomas Register” first — and now, by proxy, with thomasnet.com. If this site has not been generally classified as a “manufacturing directory” then, as you pointed out, “Google should start over.” It seems equally likely that one of the categories “cell phones” must be classified for is something like “consumer goods” or “consumer products”. Again, if you had asked anyone 20 years ago where they would go for a “rating” on a “consumer product”, Consumer Reports would have been the expected answer.
Thanks again – great job.
Have seen the WhiteBoard Friday video myself, and still am trying to digest all the insights that Rand shared. Your post above also has some interesting insights. Panda, Penguin and what’s next? Will the Google’s next update be named Pony?
Relevance has always mattered and still does. Mentioning and linking to related sites is a win/win but too few do it because they’re afraid of competition.
Your competitors are NOT your enemy – corporations including Google are. This new revelation about co-citations is one more way to make it even harder to figure out why sites rank – or don’t. Running link reports on your competitors won’t be much help if links are not why they’re ranking above you.
Will SEOs create tools to find citations on other sites now? While you are all busy trying to figure out how to stay in Google’s good graces they are busy systematically taking converting keyword phrases away from small businesses, affiliate marketers and bloggers and handing them to their big brand buddies.
If you want the freedom to make a living or choose to shop at small businesses instead of big brands it is about time you realize that they’re all going to go under if we don’t move as many shopping dollars to them as possible and stop supporting corporations.
You want a better economy and more jobs that pay a living wage? Then stop buying from big brands and promoting third parties like Amazon and buy from small businesses instead.
I just read about this on AJ Kohn’s G+ page. I think you have a point about using the LSI words right in the hyperlink as anchor text. The problem is, how many webmasters would do that on their own? Seems like the only “natural link” is a a naked anchor and everything else is suspect and subject to a “jarring and jolting” penalty when Google really hits us with the next Penguin.
Great analysis of Rand’s Whiteboard Friday. I tend to agree that the context of the content surrounding the link affects how a search engine evaluates the link.
In addition to a comment I left on Inbound, I’d also like to add that this post on Moz: http://www.seomoz.org/blog/guide-to-competitive-backlink-analysis links to OSE. That Moz post is all about backlink analysis (uses it in the title and URL etc). That post ALSO ranks #3 for “backlink analysis”. I’d be very surprised if that post alone doesn’t help increase the relevancy of “backlink analysis” for OSE.
Also, a related operator search on that post reveals OSE as about the 14th result (screencap): http://screencast.com/t/nrhTZyMtSD – I’ve alwats thought the related: operator to have very entity-like results.
Your comments are great. You should take them and re-edit them into a separate post because they were that informative. I’ve been making sure my anchor text is related to the rest of the content on my page/post for a while now.
Really like your breakdown of Randâ€™s Whiteboard Friday. The relevance of text surrounding anchors is something we seem to discuss a lot here. As long as your content is relevant, written naturally and is adding something of value, surely you will be hitting those co-occuring phrases.
Wow, detailed analysis that’s going to take a couple of reads to really soak up. Been talking a lot at work about anchor text and whether it’s still good or becoming a bad thing as it probably is prone to misuse by black hat link building. I think context has been important for a long time but there’s some interesting angles and perspectives in here I hadn’t really thought about, thanks for a great post.
I think your article entered the bottomless pit of anchor text, the more it’s delved into the more difficult it seems to appraise; in an ideal world a co-citation link would be a natural way of adding extra useful information to a webpage, similar to Wikipedia’s usage – hey that’s maybe why the site ranks so strongly in Google Search.
Man I know this sounds like my Dad used to talk, but LinkMoses is getting cranky. I’ve been preaching about the uselessness of pursued anchor text since before many of you were in high school, including Rand. The most overlooked aspect to the anchor-text-as-signal debate is the credibility and historical utilization of anchor text by the site granting it in the first place. And the higher up the content credibility food chain you go, the less likely are you to see keyword anchor text in use. In fact, there are places where you DON’T want anchor text, even though you may think you do. Anchor text obsession has destroyed many link profiles. I’ve sent out a few hundred thousand link requests over the years, and have never once asked for anchor text. Not once. So if it occurs, it’s natural and editorially selected and placed, not manipulated. Lastly, remember Google has historical anchor text data going back to cave man days. They know more about it than any of us ever will.
Nice work Bill. Thanks for sharing. The world of seo keeps on changing like life and we must adapt.
Great detailed article. However, it seems like it’s a concept that might be easy to manipulate, much like targeted anchor text links.
Yes, it’s quite possible that synonyms may have played a role as well in determining that one or more of those sites may have ranked for the query terms in question. I did include that in the list of possibilities that I wrote about above. It does seem like Google was considering “business” as a synonym for “manufacturing,” at least within the context of a “business directory” and a “manufacturing directory” being synonyms for each other.
See: More Ways a Search Engine Might Identify Synonyms to Expand Queries With
Hi Azanka ,
I’ll try. When you performed a search for a specific query, sometimes a result didn’t seem like it would rank well for the query terms, and you might not even see highlighted terms in the page title and snippet that matched the query.
If you looked at the cached copy of the page in those search results, Google would show a message at the top of the cached page that might tell you something like, “the query terms XXXX XXXX” only appear in links pointing to this page, and do not appear on the page itself.” It doesn’t look like Google tells you that anymore, based upon a number of cached copies of pages that I’ve looked at.
Hi Steve Grady,
I’ll look forward to hearing the results of your efforts if you are willing to share them.
Relevance is more than just matching keywords to a page, and if the pages within a set of search results provides what it thinks matches the intent of a query, one or more of those pages just might not include the query terms and may still be a good result for the search.
Yes, looking at which phrases tend to co-occur within a set of search results (top 10, or top 100, or top 1,000, or some other number), or within the different clusters of those results if there are multiple possible meanings for the query used, might help boost a page that used those. As I noted though on the phrase based indexing patents, is that too many of those co-occurring phrases (as determined statistically) might be a sign that someone is trying to spam search results – for instance, by scrapping the content of a bunch of the top ranking results for that query. I can’t tell you with any certainty where the line might be between including enough co-occurring phrases to boost a page in results, and what might be too many, and may seem like an intent to spam.
Not quite certain what you mean by this question.
If a search engineer was the on writing that hypothetical plugin you mention, it might be helpful. But it’s possible that a combination of different ranking algorithms might be responsible for the rankings of those particular results that Rand identified rather than just one. Some of the raranking results I mention above may use co-occurrence in part to do what they do (phrase based indexing, for instance) while other may include co-occurring terms (Reasonable surfer, perhaps), and others don’t necessarily imply the use of co-occurring terms (entity association and synoynms within context).
Under phrase-based indexing, someone performs a search for “baseball stadium.” Google returns an initial set of results based upon an information retrieval score (on page factors and anchor text pointing to the page) and an importance score (PageRank). It looks through the results (let’s assume that all of the results are actually about baseball stadiums so that there’s just one cluster of results). It finds all of the good phrases in that set of results, and reranks them so that the pages that include more co-occurring phrases are boosted in the results from their original place within those results. If the phrases “pitcher’s mound,” “dugout,” “bull pen,” “home plate,” and “first base,” were the most popular good phrases in those results, then pages where many of those terms co-occurred would be the ones that might be boosted the most.
The search engine might also look at the links to all of those pages in the search result set, and if those co-occurring terms (or related terms) where the anchor text pointed to pages in those search result were those co-occurring (or related) phrases, those pages might be seen to be more related to the query as well based upon the use of that anchor text pointed to those pages.
This isn’t the same as entity association, where a search engine might determine that a specific term or phrase might be associated with a particular business or domain name.
Thanks. Looking at LSI in depth can give people an idea of another approach to co-occurrence. I don’t think that Google is using the version of LSI that was originally written about in the original 1990 paper on latent semantic indexing, but as the quote in that paper notes, it could potentially be “similar.” Google has mentioned probabilistic latent semantic indexing in a number of patent and papers, which has the benefit that it will scale much beyond the original concept of LSI.
One of the points behind the Predicting Bounce Rates in Sponsored Search (pdf) paper was in using predictive algorithms in a way that would scale to the use of very large data sets that used machine learning approaches.
I don’t think that anchor text is losing it’s value. I do think that the search engines are getting around the ways that people are abusing it in a number of ways, such as the processes described in phrase based indexing and the reasonable surfer models. I also don’t think that home pages are treated any differently by those approaches than any other pages on a website.
Short links in social media sites probably haven’t had much of an impact for sites like Twitter which have nofollow attributes attached to them. But, even if they didn’t have a rel=”nofollow,” I don’t think that the search engines would really pay much attention to them when it came to hypertext analysis anyway.
Thanks. Rand mentioned in a tweet pointing to this post that the term he was thinking of was co-occurrence rather than co-citation, but regardless of the exact term, I think the idea that the terms would appear together within the same documents or a regular basis (regardless of whether or not one of the terms might be in a link or not), is the important part. That tendency for terms/phrases to tend to appear together on documents points out a semantic relatedness.
That Google might look at documents that are among the top ranking pages for a specific query (under phrase based indexing), to hunt for co-occurring phrases within that set of search results, and boost results where those co-occurring phrases appear together within the same documents in that set of results means that there is a relationship between the query terms and those phrases.
Where it does relate to anchor text is that links to pages in those search results for those queries that use the co-occuring phrases may help those pages rank higher for the query term.
So, if a lot of the top (10/100/1,000)search results for “manufacturing directory” include phrases that appear on the front page of thomasnet.com, and thomasnet.com is a result for the query “manufacturing directory,” then it might be boosted in the search results for that query. If a lot of the anchor text pointing to the thomasnet.com home page also uses those co-occurring (or related) phrases, then it might be boosted even more within those search results.
Not sure that the original White Board Friday really proved in any way that anchor text has been devalued. But, it is possible that some of the reranking approaches used by Google that I’ve described (phrase based indexing, reasonable surfer model) may mean that anchor text within different contexts may carry different amounts of relevance weight.
It’s definitely worth thinking of anchor text and exact match domains or business domains. I did find it interesting that the Reasonable Surfer model discussed that one of the things that they might look at in determining how much weight a link might carry could depend in part on how commercial the text is within a link. That reminded me of Google’s patent on exact match domains, and how commercial the anchor text might be in those. See: Google’s Exact Match Domain Name Patent (Detecting Commercial Queries)
Google Sets is another approach from Google that looks at terms that tend to co-occur. In those, Google looks for terms that appear within lists on webpages (or are collected in a way that could be said to be a list, even if they don’t necessarily use HTML list elements). Based on showing a couple of examples of terms that might be related, Google Sets will show other terms that tend to co-occur within the same lists. They could be considered to be semantically related as common members of lists that show up in high enough volume. Great example of co-occurrence of terms.
I agree that many of these different reranking approaches might be at play here, and that co-occurrence is one of the signals that likely plays a role in some of them, as a semantic signal.
Thanks for sharing your observations regarding sets and synonyms.
I like “link context” to describe the words around links that might be associated with them. That’s definitely a concept that could be used to describe co-occurrence of words with anchor text in links. It’s not the same use of co-occurrence as described in phrase-based indexing, but it’s definitely another type of co-occurrence that could be used by Google.
It’s also something that might be tied to another potential future ranking signal by Google that might look at the words around links to try to understand the sentiment associated with those links. A recent hire by Google was one of the authors of the following paper that does something like that:
Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers (pdf)
So that’s another approach to co-occurrence that might also influence how search results are ranked and displayed in search results as well.
Rand looks like a cross between Mr. Bean and Prince Charles without facial hair lols
Thanks. The category reranking approach does seem like a good explanation for those pages being boosted higher in the results for those queries. for all three sets of query terms and pages returned. The pages all have enough PageRank and are somewhat relevant for the query terms, but maybe shouldn’t be ranked as highly for those terms as they are just based on those factors.
Chances are that they may also benefit from other reranking approaches as well, including some of the ones that I’ve mentioned above.
If we take Rand’s “co-citation” as “co-occurrence,” then the way that co-occurrence is used in things like phrase-based indexing could quite possibly play a role as well. It’s not odd to think that pages classified as fitting into specific categories are going to include terms and phrases that tend to co-occur either. 🙂
I agree with your approach of looking at multiple potential reasons, and applying something like Occamâ€™s Razor to see which explanation or explanations are the ones most likely to be the reason for what we are seeing. In these cases, I think it’s possible that there are a combination of reranking approaches to blame for the results we are seeing.
Glad to hear that we both gave you lots to think about.
Not sure that I want to see a Google upgrade named “pony.” 🙂
Google’s algorithms and processes described in papers and patents and blog posts are primarily aimed at helping searchers find what they are looking for, and if those approaches also make it harder for people who try to make pages appear more relevant for something than they actually are, I don’t see that as a bad thing.
I also don’t think that Google has a “brand bias” aimed at taking traffic away from small businesses and bloggers, and making it easier for big businesses to rank ahead of them. In fact, I think the way search engines rank things often is more biased towards small businesses and bloggers. Google doesn’t “favor” its own properties when it includes search results for different vertical searches in search results. Instead, it offers those results because it has algorithmically determined that searchers likely want to see those results.
I don’t think that anyone here is talking about “LSI Keywords.” There have been people who have sold “products” or keyword research approaches based on what they referred to as “LSI” that really wasn’t. LSI is a method of indexing words and phrase, and not a method of generating keywords. LSI was first described in a paper written in 1990, for use with small sets of documents (about 10,000 or so documents) that don’t link to each other, and don’t change (the way that the web does).
The web is too large a document set to use LSI and it changes to frequently to use LSI (imagine having to rerun LSI every time a single word changed on a document on the Web).
People who sell LSI Keyword research tools do not have access to Google’s index of the Web, and will likely never have access, so they base their “LSI keywords” approach on a small document of representative terms. It doesn’t really truly represent the Web, and therefore has no value.
Your links in your hypertext should ideally be descriptive of what you are linking to so that you’re giving people who follow that link an idea of what they might find when they click on that link. There’s no problem with linking just using the URL for the page either.
But when you do things like create lots of low quality pages, and try to manipulate rankings with those pages by using the same anchor text over and over on those low quality pages (or doorway pages that the low quality pages all link to), then the search engines are going to treat your links as suspect.
Thanks. It is quite likely that Google does look at content that surrounds a link, and that text may influence the weight of that link to a degree. But as I described in my post, that may only be one type of co-occurrence that Google is considering when ranking pages, and when determining how much hypertext relevance is passed along in a link.
Thanks for pointing those out. The related SEOmoz page that links to OSE can definitely be one of the things that helps in the ranking of OSE for “back link analysis.”
The “related” results use the kind of co-citation that involves pages that tend to be linked to by other pages. Page A and Page B are both linked to by Pages C, D, E, and F. Because A and B are linked to (or cited) by a lot of the same pages, they are said to be related. This kind of co-citation isn’t the same kind of co-citation that Rand was referring to, but I agree that it’s another way to find pages that tend to be about the same things. We don’t know if this kind of co-citation is used as a ranking signal, but it often does seem to help in finding pages that are related on a larger scale of granularity.
Thanks. I’ve been tempted more than once to write a post in response to a comment rather than publish it as a comment. In most cases though, I’d rather try to write a response that answers or addresses the comment on the same page instead.
I try to use anchor text related to the content I’m writing most of the time. If the things you’re linking to on a page or post is relevant to what you’re writing, I’m not sure that should be too hard.
I really like this old article from Jared Spool titled TheRight Trigger Words, and the idea that anchor text should give people some idea of what they will see when they click on a link.
Thanks. As I’ve written a number of times above, co-occurrence can be used in a number of ways, and it’s not just a matter of certain words or terms appearing near links. If you’re writing about a specific topic, and you do research to make what you’ve written well balanced (like a journalist expanding a story he or she is writing by answering who, what, where, why, when type questions), you’re going to naturally include terms and phrases that tend to co-occur naturally.
So if you are going to optimize a page for a specific term, and you do a search and look at the topics and concepts that people cover in a number of the top ranking pages, and keep on seeing terms and phrases that appear in those pages, you might want to consider using some of those co-occurring terms and writing about some of those topics and subtopics.
Thanks. The way that search engines are using anchor text might not be the way that we might guess they do. It is one of those aspects of the rankings of pages that we may tend to take for granted and think about in a fairly simple manner, but things like phrase-based indexing and the random surfer model can change how it might be used. There’s no 100% guarantee that the patents descibe exactly what Google is presently doing, but if we acknowledge that there’s a very real possibility that anchor text in one link from one page doesn’t necessarily carry the same weight as anchor text in another link from another page, then I think we are looking at it in a more realistic manner.
I’m not sure I understand your comment about the “bottomless pit of anchor text.” I do know that anchor text isn’t something that Google is going to stop using tomorrow, but it’s also something they are trying to be smarter about using.
Looking at Rand’s video, he wasn’t necessarily writing about “co-citation links” either, or even “co-citation.” I’ve made it pretty clear that the concept of co-occurrence can be used to determine how relevant a page might be for a term without even the involvement of links or anchor text.
I’m not completely sure I understand your example of Wikipedia, and how you think they might be using “co-citation,” to rank higher. Care to expand upon it?
I’m a fan of writing things that attract links rather than asking for them. Write something that people find valuable, useful, interesting, engaging, and that they would want to share with others. Try to create a title for your page or whatever you’ve created, and people will often, but not always use that title as the link text if they decide to link to you.
I’ve never asked anyone for specific anchor text in links, or ever suggested doing so to anyone that I’ve worked with to build a link strategy for specific pages of their sites.
Definitely a question that should be asked. I think it’s important when looking at any potential ranking signal how that signal might possibly be manipulated.
I’m not sure that I agree that any of the reranking approaches I’ve mentioned above are the kind of thing that makes it easier to manipulate search results and rank for things. For many of them, they go far beyond things like forum signature and blog comment spam where people try to create as many links as possible to pages using specific keywords, or create doorway pages aiming specific anchor text at pages.
Thanks. Our ability to change and adapt when it comes to SEO is essential. The search engines are actively working on trying to make search better, and searchers are demanding it – they want higher quality results that fill their situational and informational needs, and they want them faster, too. 🙂
Thanks for the feed-back.
I think I was coming from the angle of how anchor text is interpreted, this is the bottomless pit bit; picking up on these bullet points:
1. Actual words in the anchor text associated with the link
2. Commerciality of the anchor text associated with the link
3. A topical cluster with which the anchor text of the link is associated
4. A degree to which a topical cluster associated with the source document matches a topical cluster associated with anchor text of a link
We’ve always been aware that in the past a link is improved with useful anchor text, but it seems Google is pulling away from that, we’ve also been aware of where a link (with relevant anchor text)is placed, i.e. is it surrounded by content associated with the anchor text, is the link bold or italic – basically in this vein the construction of a link to another site almost becomes a bottomless pit because there are so many variations – I’m also aware all of this is subject to discussion as there are no hard and fast rules.
I’m aware the above is a little off the track from co-citation, as co-citation:
“Bibliographic coupling or co-citation occurs when two works reference a common third work in their bibliographies. It is an indication that the two works treat a related subject matter.”
referenced from here:
<a href="http://en.wikipedia.org/wiki/Bibliographic_coupling" title="Bibliographic Coupling"
isn’t linking directly, but isn’t that also what Wikipedia does, i.e. different pages can reference one page without that page having to reference either of these two pages.
To me all this is link-building of some sort just using different approaches.
Is Rand right in saying anchor text is dying and will be gazumped by co-citation probably not, but it’s a progressive step forwards in trying to clean up junky self benefiting links. As I initially wrote:
“in an ideal world a co-citation link would be a natural way of adding extra useful information to a webpage”
I still think this counts, I also think you could use a good old fashioned link.
I actually just experienced something along these lines (I think). You see I am a newbie but starting to get the hang of the whole SEO world. What happened to me was I wrote all my on-page SEO and saw some decent results for my targeted keywords but needed to hire a linking agency who also gave me some things to change on my page, one of which was including my keyword in some anchor text. After I did this with 24 hours my rankings dropped almost three pages for those keywords and never really regained…. Could this be because of the anchor text addition? I didn’t change any of the wording just made a few of my keywords internal links…..
Great post Bill, contained a lot of information that I didn’t previously know, but can add for future reference
Thanks for the post. Honestly telling you the more I get into this the more I get confused specially the relevance for the hypertext. But I must say I got to know a lot of new things from your post.. 🙂
Hey Bill, i am amazed at the depth of your research in to this. Very impressed.
I have nothing else really to add. Just that i enjoyed the post and found a few nuances very intriguing.
Once again a very well thought out post Bill, Thank you! I took a look at Rands WBF where he discussed Anchor text as dying. It was really good to go through your post Bill, along with the comments to fully understand this as well as all of your other co-citation observations.
Bill – Thoughtful post. I think it is interesting that the term “semantic” has only been used 11 times so far in the article and the comments. Given that “semantic search” was such a hot topic a few years ago, the trend of reduced importance of exact matches in anchor text seems like a continuation of enhancements that have been in progress for quite a while. And I think you were rather gentle in your criticism of Rand’s choice of the term co-citation to describe the decline of important of exact match anchor text.
Thanks for the article man, my brain hurts a little bit but its better to know about this than not.
When you say the death of anchor text, does that mean that you think that the anchor text may someday have no part to play with the major search engines like the keywords meta tag? Or will its importance just be diluted a bit with factors like co-citations, co-occurances (and whatever else there is)?
I didn’t say that anchor text was dying, and I don’t believe that was something that Rand proved in his Whiteboard Friday post. I think he also admitted that part of his post probably took things to far.
Regardless, if Google does look at a wider range of ranking signals, then chances are the ones they are using have a little less value. I also think that Google is using anchor text in a smarter manner that we may have thought in the past, and chances are that it ignores it, or devalues its use in things like blog comment spam.
Looks like a good substitute, because we all know that anchor text is heavily abused. It’s easy to get co citations than build links with anchor text. I mean co citation is easy to game like conventional backlinks. This issue has been floating around, there are lot of people talking about this co citation thing
Thanks, Bill. These are great insights! As you clearly articulate and others have noted I think it all boils down to quality content. Google has stated such and it is definitely being presented in the results following changes in the algo. It’s easy to get lost in all of the different inputs in their model, but if you focus on providing value to your readers and establishing “authorship”, you will achieve the desired results.
This post was not easy to understand at first go, I have to sped a lot of time going through this and to summarize, I would like to say that these points really making a difference in ranking.
Analyzing Thomasnet.com using OpenSiteExplorer.com (paid version). Following query relevant anchors were found (without synonym analysis)
“directory of manufactures” (pointing links:5 / pointing domains:10)
“manufacturing” (pointing links:1395 / pointing domains:45)
“directory” (pointing links:387 / pointing domains:159)
“manufacturers” (pointing links:1712 / pointing domains:806)
“directories” (pointing links:6 / pointing domains:5)
OpenSiteExplorer on-page grade: F (fail) because
H2 to H4: NO
Aha! I love that they “Anchor Text is Dyingâ€¦ And Will Be Replaced by Co-citation” to “Anchor Text is Weakeningâ€¦ And May Be Replaced by Co-Occurrence”.
I seriously can’t see Anchor Text ever dying but co-citation is a move in the right direction for Google. Might make it harder for us online marketers though.
Thanks for this great post Bill, I definitely think search engines (especially one) are continuously tweaking their algos, although I thought anchor text would have been one constant indicator. In bound links will IMO always be a big factor, simply because it is one of the only factors that is off-site, and so gives info about the authority, and anchor text is definitely a big signal for the IB links. I can’t imagine any other way of Google getting that information from the site, erxcept from other sites which reproduce Google’s authority calculation (MajesticSEO, Ahrefs, etc..)
Looking over this article, this actually seems like something that is likely to happen. Anchor text is one of the easiest things to manipulate and by introducing co. citations, it makes it much harder for webmasters to “guess” what types of related terms they should build links for. Google has access to this type of data, so they know the relevance of a keyword much better than webmasters will. This is another thing they can use to judge the quality of links to a website. If the links are natural, then the anchor text should not only be diversified, but some of these related phrases that you probably would never think to use should be used as the anchor text relatively often. This just goes to show you how important it is to diversify your anchor text and not build keyword focused websites.
You might want to add how the Author rank is going to change the view on link building.
Great article! I missed Rands whiteboard talk on this, but since reading this post, you really have me thing about this in a new way. Thanks for sharing!
And this is why SEO will give me an aneurysm one of these days…
Interesting perspective, while anchor text may be dying, I don’t think the backlink will die anytime soon. As someone above mentioned, inbound links will always have to be a metric for ranking. All social signals are essentially backlinks, and while I believe Big G could classify such links it would seem that their algorithm would have to do so on a domain by domain basis (but Google does have the resources). The idea of co-citation, or citations in general is not something new as Maps/Places experts have understood the importance of such things, it just looks like it is (or will) carrying over to organic ranking factors. Great post!
I’d also like to point out that for the query “cell phone ratings” – the Consumer Reports page rankings does have the word “ratings” appear 15 times on the page (just using ‘find’ in the browser) and that most other things ranking in that SERP also have “reviews” in the title and not “rankings”.
Additionally, all three searches in question have relatively low search volumes (about 100-1300 locally). I’d love to see this on something getting 5000+ searches a month.
While I’m not denying there might be co-occurrence at play, or other non-traditional ranking factors, I am curious as to wether these are just lower competition queries to begin with.
thank you Bill and the others for your comments. I publish this at my google+ page. My intuition that using related terms in the anchor text for variations is confirmed here. I will look more into Co-Occurence concepts.
An easy way to find related terms is also using Google Insights, or simple looking at the bottom at the google result page, where the most used related terms are listed.
I’d love to see the effect that co-citation would be having at lower levels. The companies he chose are wide-reaching, with a vast amount of content online concerning them. I wonder what the minimum level would be for co-citation to actually have an effect on rankings? I assume we’ll only be able to dive into that data once co-citation moves up as a ranking factor and people begin to take it more seriously.
Really not co-citation at work in what Rand described. A co-occurrence reranking approach could potentially work with much less competitive queries.
There’s a specialized meaning for “related terms” under the phrase-based indexing patent, and they aren’t from Google Insights or the query refinements that might appear at the bottoms of search results pages. Those are calculated differently. I mentioned these “related terms” in this post because they tend to be phrases that co-occur on a top (10, 100, 1000) pages in response to a specific search result for a specific query.
Pages that tend to contain more of the co-occurring terms can be boosted in search results.
Question: does having more co-occurring terms on the page that SENDS the link also boost the value of the link?
And can a large number of co-occurring terms on the sending page overcome the fact that the link-sending domain is not in a related niche to the link-receiving domain (in other words, if I submit a guest blog to a site, can I get around the fact that site is in a different niche by simply making sure my article has enough co-occurring terms)?
The patents from Google have nothing to do with any of the pages having links on them to a specific page. I can see how that might be inferred from what Rand presented in his video, but it’s wrong. This has more to do with the fact that if you perform a query for a specific query, and you examine the pages that show up in search results, you might see some of the same terms or phrases appear on a regular basis. A search engine might boost those pages in search results because of the presence of terms that tend to co-occur on a lot of those pages. These patents don’t say anything whatsoever about “link sending sites.”
Re-read this post again… and every comment, to dig as much out as I could. Testing, and seeing what works best. Thanks 🙂
It does seem that too much emphasis is being placed on the words within the Anchor text. Rather than doing so content writers should place the links that are most descriptive of the linked website using the already existing text within the article of the page.
More and more Google are marking the pages manually and then reapplying the algorithm of any consistencies that it finds within the manual marking procedure, what we will find increasingly is that it will become less definitive as to what the algorithm is based on best guess but actually contextually how people think.
Increasingly the more links are build with the same anchor text the chances are that you will see those rankings diminish as a result for being unnatural, there is definitely a weighting ratio though with hundreds of other factors it is simply one of very many.
Comments are closed.