Anna Patterson, Creator of Phrase Based Indexing
The builder of the largest search engine in the World during the 21st century joined Google shortly after building that search engine and possibly licensed the technology behind it to Google. She worked for Google for several years, creating a way of indexing pages based upon the meaningful phrases that appear on those pages. She looked at how phrases co-occur on pages to cluster and rerank those pages, using the phrases to identify spam pages and pages with duplicate content, and creating taxonomies and snippets for pages using phrases. This phrase-based indexing system provided a way to defeat Googlebombing and to determine how much anchor text relevance should be passed along with links.
Then Anna Patterson left Google to start the search engine Cuil, which was supposed to be a Google killer. Except it wasn’t. Now she’s back at Google and looks to be working on phrases again.
Multiple Generations of Patents involving Phrase Based Indexing
There could be said to be three generations of her phrase-based indexing system, described in three generations of patents.
The first generation of this patent family was filed on July 26, 2004, or within the next couple of years afterward.
The second generation of phrase-based indexing patents appears to have been filed on March 30, 2007, and describes how phrase-based indexing could be implemented into a large-scale data system. Unfortunately, a few of these second-generation patents appear to be still pending and haven’t been made public yet.
The third generation of phrase-based indexing patents is starting to make it onto the scene with the refiling and recent granting of a continuation version of one of the original first-generation patents.
Single Word Indexing
In addition to ranking documents based upon the quality and quantity of links pointing to a page, Google also looks at whether or not the query terms searched for also appear upon specific pages. Google’s Matt Cutts wrote one of the best descriptions of how Google may do this in the first Google Librarian Newsletter, which appears to have disappeared from the Web not too long ago. However, I found a copy on the University of Michigan website, and it’s a highly recommended document which I’ll build upon with the rest of this post.
That first newsletter asked and answered the question, How does Google collect and rank results? As you read it, pay special attention to where it talks about “posting lists.” If you start reading through the second generation of phrase-based indexing patents, you’ll see references to how phrases may be included in posting lists as well.
Phrase Based Indexing
A number of the first generation phrase-based indexing patents were filed on July 26, 2004, and the descriptions of most of those patents are substantially the same, though the claims differ.
I’ve written several posts about phrase-based indexing, and the one that provides the most detailed look at this approach was one I published on December 29, 2006 – Phrase Based Information Retrieval and Spam Detection. (Highly recommended that you stop and go read that post before moving on with this one.)
SEO by the Sea posts on the first generation of phrase-based indexing patents:
- February 10, 2006 – Move over pagerank: Google’s looking at phrases?
- May 19, 2006 – Google Aiming at 100 Billion Pages?
- September 16, 2009 – Google Phrase Based Indexing Patent Granted
SEO by the Sea posts on the second generation of phrase-based indexing patents:
- March 15, 2009 – What are the Top Phrases for Your Website?
- April 7, 2010 Phrasification and Revisiting Google’s Phrase Based Indexing
Assumptions and Approaches behind Phrase Based Indexing
1) It’s possible to distinguish between a good phrase and one that isn’t so helpful. A good phrase has meaning in itself, like “ice cream,” meaning something different than just “ice” and “cream.” A good phrase is a complete phrase, like “president of the United States,” instead of “President of the.” A phrase can be one word long. A phrase can have more than one meaning, such as “German Shepard,” which can mean a sheepherder in Germany or a specific breed of dog.
2) Certain phrases tend to co-occur with other phrases, So for instance, if you did a search for “President of the United States” and looked at the top 10, or top 100, or top 1,000 pages in that search, you would probably see several related terms that appear regularly on those pages, such as “Whitehouse, “vice president,” “Oval Office,” “Washington, DC,” and so on. It might be possible to rerank those search results to boost ones that tend to have more of these commonly occurring related phrases. Pages that statistically have more of these phrases should be considered spam.
3) Where there is a phrase with more than one meaning, there might be “clusters” of related phrases of different types. So, when the phrase is “German Shepard,” and one set of related phrases that appear in the top (10, 100, 1,000), search results involve terms like “kennel,” “dog collar,” “dog house,” “obedience training,” etc., that might indicate one meaning of the phrase. However, when the second group of documents that rank for the phrase “German Shepard,” include terms like “sheep herding,” “Germany,” “large flock,” and “Grazing space,” those phrases may indicate a second meaning, describing a person from Germany who herds sheep.
4) Anchor text in links pointing to a page that includes the phrase or a related phrase (one that tends to co-occur on pages that rank for that phrase) should be given more weight than anchor text that doesn’t. So a page that includes the biography of the President of the United States, that is the target of a Googlebomb using text like “miserable failure” won’t help that page rank for the term “miserable failure” unless the page is somehow relevant for the term. For example, a few years back, Google announced that they had defeated a specific Googlebomb for the biography page of George W. Bush using the phrase “miserable failure.” It stopped ranking for the term, at least until someone at the Whitehouse inadvertently caused the Googlebomb to return by adding the word “failure” to the page during an update.
5) Google could also purposefully get a page to stop ranking for a specific phrase by removing the connection between the page and the phrase in its index, which might be a way to penalize a page for spam-type practices.
First Generation Phrase-Based Indexing Patent Filings
- Phrase-based indexing in an information retrieval system (US Patent No. 7,536,408)
- Phrase-based detection of duplicate documents in an information retrieval system (US Patent No. 7,711,679)
- Information retrieval system for archiving multiple document versions (US Patent No. 7,702,618)
- Detecting spam documents in a phrase based information retrieval system (US Patent No. 7,603,345)
- Phrase-based searching in an information retrieval system (US Patent No. 7,599,914)
- Phrase-based generation of document descriptions (US Patent No. 7,584,175)
- Phrase-based personalization of searches in an information retrieval system (US Patent No. 7,580,929)
- Phrase identification in an information retrieval system (US Patent No. 7,580,921)
- Multiple index based information retrieval system (US Patent No. 7,567,959)
- Automatic taxonomy generation in search results using phrases (US Patent No. 7,426,507)
Second Generation Phrase-Based Indexing Patent filings
Many of these patent filings haven’t been published yet by the USPTO and may not be until they are granted. So while a couple of the published patents include Anna Patterson as an inventor, many don’t. The first listed is a pending patent application, though many of the phrase-based indexing patents weren’t published until granted.
- Integrating External Related Phrase Information into a Phrase-based Indexing Information Retrieval System (US Patent Application 20090070312)
- Index server architecture using tiered and sharded phrase posting lists (US Patent 7,693,813)
- Index updating using segment swapping (US Patent 7,702,614)
- Query scheduling using hierarchical tiers of index servers (US Patent 7,925,655)
- Phrase Extraction Using Subphrase Scoring, filed Mar. 30, 2007 (unpublished)
- Bifurcated Document Relevance Scoring, filed Mar. 30, 2007 (unpublished)
- Inde server Architectures in Tiered and Sharded Phrase Posting Lists, filed Mar. 30, 2007 (unpublished)
- Query Phrasification, Ser. No. 11/694,845, filed Mar. 30, 2007 (unpublished)
Third Generation Phrase-Based Indexing
There only appears to be one of these at this point. However, Google might file more continuation patent filings which add additional claims to existing patents in this series, or divisional patent filings, which might separate some claims from a specific patent and focus upon expanding those claims.
Detecting spam documents in a phrase based information retrieval system (US Patent No. 8,078,629)
Invented by Anna Lynn Patterson
Granted December 13, 2011
Filed: October 13, 2009
An information retrieval system uses phrases to index, retrieve, organize, and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are indexed according to their included phrases. A spam document is identified based on the number of related phrases included in a document.
There are many reasons to believe that Google is using Phrase Based Indexing beyond the sheer number of patents. It’s worth spending some time experimenting with phrases to get an idea of how Google treats them.
If you perform keyword research, optimize web pages, and do link building, you’ll find that understanding how phrase-based indexing works will be helpful.
On the plus side, even if Google isn’t doing phrase-based indexing quite like what is described in these patents, understanding things such as what terms might be “related” to terms or phrases that you might want to optimize a page for and working to include those related phrases on your page can result in richer and higher quality pages.
The very first Phrase-based indexing patent (Phrase-based searching in an information retrieval system) was updated with a continuation patent. I wrote about it in the post Google Phrase-Based Indexing Updated
All parts of the 10 Most Important SEO Patents series:
Part 1 – The Original PageRank Patent Application
Part 2 – The Original Historical Data Patent Filing and its Children
Part 3 – Classifying Web Blocks with Linguistic Features
Part 4 – PageRank Meets the Reasonable Surfer
Part 5 – Phrase Based Indexing
Part 6 – Named Entity Detection in Queries
Part 7 – Sets, Semantic Closeness, Segmentation, and Webtables
Part 8 – Assigning Geographic Relevance to Web Pages
Part 9 – From Ten Blue Links to Blended and Universal Search
Part 10 – Just the Beginning
Last Updated June 19, 2019
64 thoughts on “10 Most Important SEO Patents, Part 5 – Phrase Based Indexing”
That’s completely true as I am personally “optimizing web pages, and do link building” proves Google is using Phrase Based Indexing.
Thanks for information!
It’s nice to get a little refresher on what search engines really do :). Not much to say here Bill, but once again, stellar job with this post. I ended up checking out a few of the posts from 2006 you mentioned; it’s relieving to know at least some part of SEO is the same as it was 5 years ago.
Hmmm…This makes me wonder if somewhere buried in the algorithm is coding that is in some way, shape or form related to the same coding that makes the Microsoft Word software make grammatical recommendations whenever it feels that you worded a sentence incorrectly.
In other words, if the wording on your web page gets flagged for incomplete or grammatically incorrect phrasing, will its SERP result be tarnished in some way?
I bet you something similar is going on here.
LOL. Wouldn’t that be funny if one of the Google bots used MSWord as part of the ranking algorithm 🙂
Loving this series of posts. Never seem to find the time I should to follow patents, so an authoritative round up like this is great.
This post has me wondering whether phrase based indexing also applies to incoming anchor text. Seems like a logical step and might tally with some of the accounts of over optimised anchor text.
I’ve found that utilizing LSI has been the most important factor for building relevance. For anchor text, alternating phrases that Google considers related is key.
actually its quite an eye opener since i’ve never heard about phrase based indexing..may be because I was too busy blogging than looking at other imp aspects! thanks for letting me know
Heya – thanks for this article; I need to take some time to read the prior articles as well now. =) What amazes me is that it is possible to come up with an algorithm to supposedly determine some of this stuff. Though… I am not convinced Google has automated the process as much as they would have us believe.
I’m not familiar with phrase based indexing, i think I need to read more about this.
Read this one and the one posted on dec 29,2006 … both of them are informative and definitely useful for newbies like me …once again thanks for the share
Mmmm… I will have to read through those patents, it looks quite interesting. And I do not think you are mistaken. From my experience I have found that Google seems to frown on incoherent sentences and random text – I get the impression that sometimes Google “resets” words in a different order, just because it makes more sense, and that keywords consisting of different words usually rank better in their “natural” order.
This, of course, seems to be also in line with the Panda algorithm update. Grammar seems to count now, but would Google really start analyzing the grammar of every sentence? That would be very computational intensive, probably more than it’s worth. I would say that it is more likely that they have a huge database of “acceptable phrases” for grammar verification, and to handle that you would require some kind of phrase based indexing. Apply Occam’s razor and you get…
Gosh..it really needs to take a deeper look at it…”a google science”
It definitely helps to think about how phrase based indexing might influence what you might do when engaging in those activities. 🙂
Thank you. Many of the ideas and assumptions that search engineers hold seem to be very similar to what they were back then, but the technology that they use has definitely given them the ability to do more.
For example, Google’s Big Daddy infrastructure update made it much easier for Google to test new algorithms and try new things out. Google’s Caffeine update enabled them to see the impact of those changes much faster, and to pay more attention to a greater number of signals.
I’m not sure if it’s a good idea to compare Microsoft’s grammar checker algorithm to Phrase-Based indexing. A large part of the phrase based indexing approach is in identifying “good” and useful phrases that appear on pages, and identifying other pages that might be relevant for specific queries to see if other phrases tend to co-occur on those pages, so that a semantic relationship might exist between those different phrases.
A grammar checker has some very different purposes and objectives.
Google is also working hard on building statistical language models, which can help them do things like identify the language that a page is written in, possibly identify the language that a query is written in, and possibly help them understand and score the grammar of a specific page. Rather than approaching grammar from a heuristic viewpoint, where many different rules might be collected about how words should be joined together, this type of statistical approach looks at very large sets of documents and builds a model based upon those of how language is being used. The approaches try to solve very similar problems, but from different perspectives.
I brushed the surface on some of that research in my post Google’s Paraphrase-Based Indexing, Part 2, and there are some links to whitepapers, blog posts, and patent filings that go into more depth in that post.
The phrase based indexing patents do seem to apply to incoming anchor text as well. It appears that, if anchor text pointed to a page might contain a “related phrase,” (one that tends to co-occur with other pages that might rank well for the same query), then it might be given more weight than anchor text that doesn’t.
There are some toolmakers who refer to the tools they make to help SEOs as “LSI” tools. Unfortunately, that’s probably very misleading. For someone to actually use LSI on Google, they would need complete access to Google’s index, that index would need to not change in any signficant way, and LSI would probably not scale effectively on an index the size of Google’s.
Fortunately, you probably don’t need LSI tools to be able to find phrases that tend to co-occur on pages that rank for the same query terms.
You’re welcome. There are a few people who have written about Phrase based indexing in the past, like David Harry – What you need to know about phrase based optimization and at places like Webmaster World, but it probably hasn’t been discussed as much as it should be.
I was somewhat apprehensive about how much Google might have implemented phrase based indexing until I started to see the second generation of patents from them that described how it could be incorporated into their search engine. That’s not a complete and absolute indication that they have, but it’s a signal that definitely points to the possibility.
That’s one of the reasons why I started this series – to give people who might not have had a chance to spend much time with topics like phrase based indexing to revisit them. I’ve written a number of posts about some of the different patents in the past, and this was my chance to bring those altogether along with links to all of the patent filings I could uncover on the topic.
Phrase based indexing isn’t so much about grammar as it is about the fact that when someone write about a certain topic, they tend to include other phrases within which they’ve written.
If I write something about an NFL football stadium, chances are that I’m going to include phrases like, “end zone,” “sidelines” “first down marker,” “concession stands,” “tailgating,” “parking spaces,” “playing field,” and so on. Google might look at the top 1,000 results on a search for [football stadium] to identify which “good” phrases tend to show up in those pages and identify the ones that do as “related phrase.” It might then look to see which pages contain many of those phrases, and boost ones in the rankings that do. If a page has too many of those related phrases based upon statistical probabilities, then it might be identified as spam. If there are “clusters” of phrases on pages ranking for that term that might indicate that “football stadium” has different meanings, such as an NFL stadium or a European football stadium might have, then it might cluster those together, understanding the different meanings. It might then try to provide some diversity in search results by including some pages for the NFL football stadiums, and some pages for the European football stadiums.
In that particular case, with football stadiums showing up possibly for NFL football stadiums, European football stadiums, and Australian Rules Football stadiums, Google might also use it’s “preferred country” algorithm to rank those results as well.
As I mentioned in my response to Mark above, Google may be looking at grammar when it comes to the things they’ve been doing with statistical language models, and that may help them do things like identify spun articles or content on pages, and quite possibly in ranking pages for something like Panda which involves classifying documents based upon features associated with those documents, which might include something like a grammar score (don’t know if it does or not, but it’s a possibility).
so that means “how to make money online” is a better than just “make Money”?
And That world will be indexing by google soon that the short one?
It`s a very useful article for me as a new web-developer. I can add that Google is not very affable to sites from .ru domain zone and CIS zone entirely. And you can level up in rating using unique pictures on your site and making web of internal links.
I tend to agree with David B (above) in that LSI seems to play a major role in ranking. I think that often people focus in on a particular keyword or phrase and hit it too hard.
It’s common sense that for Google to do its job well, it must look for natural backlinks. It must try to decipher which sites are being linked to naturally as a result of valuable, relevant content and which are being linked to as a result of negotiation and link building. Using LSI keywords will give a much more natural look to your inbound links.
I find that a good cross section of related terms (in anchor text links) mixed in with a reasonable level of use of the main term, always works better.
I have always avoided carefully crafted link wheels and many of the other structured techniques that many people use. Not because I’m aware of something in the algorithm that picks this up but for the potential for it or the potential for manual checking. I personally feel that the more abstract your link building is, the more natural the links will appear. I think that creating carefully organised link wheels etc, is almost as good as holding up a sign saying I’M A SPAMMER. Create good content first and tweak it to suit the search engines and you are keeping both the visitors and Google happy. Good content will also attract natural backlinks.
Very intersesting post and thank’s for several links out to more good information.
Hope you don’t mind me saying, a very long second sentance. (lol).
It’s possible that Google might see either phrase as good and valid phrases, regardless of the length of either. But the “related phrases” that might be appropriate for each of those phrases is likely going to be different based upon which phrases tend to co-occur with those in a top number of results on searches for those terms. A page that included a number of “related phrases” for one of those terms, and possibly linked to that page from other pages with some of those related phrases might rank better than one that otherwise might be similar in terms of other ranking signals that Google might use.
Thanks for your answer. I believe that we were however talking about the same thing – what you point out is that Google is probably using phrase-based indexing for semantic recognition (which seems quite evident, specially after reading your post), and I was simply pointing our that is can be also used for grammar correctness recognition. I must however point out that I was not thinking about short terms (two or three keywords), but rather about somewhat longer sentence constructs (6-8 words), for which Google by now must have compiled a quite huge volume. From my experience, Google is capable of detecting grammar errors, at least some – I had a few customers being penalized after Panda because of it, but bouncing back after the corrected the text.
Regarding the response to Mark on language recognition, I am not sure that you disprove the use of phrase-based indexing for that purpose. I actually wrote last year a four-post series on search engine language recognition of websites (actually inspired by your article “how search engines know the language of a query”, thanks for the idea). I also reached the conclusion that at least Google uses probabilistic methods (amongst others) for language detection, as highlighted for example in the Google research blog post http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html#!/2006/08/all-our-n-gram-are-belong-to-you.html. But the use of N-grams by Google implies also some kind of phrase-based indexing…
The mysteries of Google – ones all modern-day business people would love to get their hands on. Their phrase based indexing is certainly a smart and (generally) efficient way to rank pages or determine their relevance, though as well all know there are over 200 different ways they decide those things. Oh, and to Apellierre, do you know that the rankings for a site such as one with .ru are often ranked differently on the various Google servers? (Meaning if you look up a phrase that an .ru site is high on in Russia’s Google and then check it on the US’s Google, it probably won’t be the same simply due to geographic differences).
I spent my half an hour to read this awesome post along with my coffee sip and i found this information very useful specially for the newbie seo.
Thanks. Great questions.
It’s definitely likely that Google is using technologies that can work together to further the way they are trying to index content and make sense of the Web. A phrase-based indexing approach and an n-gram approach can work together well, and it’s definitely possible that grammar is an element of Panda.
One of Mark’s questions was about whether phrase based indexing might be used as a grammar checking program, and that’s really something that isn’t discussed in any of the phrase-based indexing patent filings in any manner, that I can recall. I’m the one that brought up language recognition as an aspect of statistical language models, and ideally a phrase based indexing approach would be aided by understanding of the language that a page might be written in. But I’m not sure that the phrase-based indexing approach is as helpful in identifying the languages of pages as an n-gram analysis. I don’t remember anything within the phrase based indexing patents that discussed identifying languages through phrase-based indexing.
The use of n-grams can help the search engine identify and understand phrases, as can the use of sites like Wikipedia and Freebase and Princeton’s Wordnet.
I’ll have to confess that I’ve done no research into how Google might be treating sites from .ru domains, so my knowledge in that area is limited.
Using unique pictures, and structuring internal links intelligently on a site can definitely be helpful.
I think it’s really important to be careful when it comes to refering to the use of “LSI” when it comes to SEO and ranking pages. LSI is an approach to indexing content within document repositories that originated in the early 90s, and focused upon databases much smaller than the web, that remained relatively static and unchanging unlike the Web, and didn’t take advantage of things like links between pages.
There have been a number of indexing approaches developed since then that are much more suited for querying databases of Web content and finding content on them, including a number that attempt to understand semantic relationships between words like the phrase-based indexing approach.
Unfortunately, there have been a number of people who sell software tools aimed at helping people try to do things like keyword research and content development that claim to use “LSI” in the approaches that they use, but they really don’t. They don’t have the complete access to Google’s index needed. Google’s index changes too rapidly. And Google’s indexing uses many other signals, including link analysis type signals that LSI doesn’t consider. That’s a good part of the reason why I suggest staying away from using that specific term.
The idea of trying to understand which phrases Google may think are “related” and using some of those on your pages can definitely be helpful for a number of reasons, from broadening the content that you create in meaningful ways, to making sure that words people familiar with a topic might expect to see on your pages are on those pages, to creating the possibility that you may rank for those related terms as well. This phrase-based indexing approach does try to understand which phrases may be related based upon how often then tend to co-occur on pages that tend to rank well for specific phrases. But that’s not LSI.
I definitely agree that it’s a good idea to use a range of related terms in links if possible instead of a single phrase.
Thank you, Steve.
Google definitely does use a wide range of signals to rank pages, and it’s really impossible to assign weights or percentages to any of them.
There are a lot of signs that appear to indicate that Google does use phrase-based indexing, but it doesn’t really matter if they do or don’t when everything is said and done. In reality, understanding that certain phrases tend to be related to other phrases, and trying to create content that may include related phrases is a helpful approach to building quality content for readers. Often when people try to write “optimized” content, their focus can become so narrow that what they are writing suffers from it. The fact that Google may be ranking pages higher that do use related phrases in both content and links is an added bonus in my eyes. 🙂
I often write many of my posts with my morning sip of coffee, though they can often take more than half an hour. 🙂 Happy to hear that you’re finding them interesting and useful.
yup a agree to your point ,,always content ia very important in blogging and mainly quality is very important ..
An incredibly insightful post on Phrase Based Indexing. I jumped in mid-stream so am headed back to catch up on the other posts in the series. Well done!
Hey,thanks for this article.
I am not sure with all the terms youÂ´ve wrote. The approach with the phrases is true but I canÂ´t imagine that Google can class the quality of a text only based on this. Good content never used to be punish â€“ mostly. You said that youÂ´ve found a copy of the Newsletter on the University of Michigan Website â€“ would you post this link resp. the path to it? I practice the optimisation of web pages an also i do link building. It would be very important to me to do my business as good as I can.
Thanks a lot!
As ever a very deep and thought provoking study into your subject matter. I am sure your conclusions hold plenty of water though second guesssing what patents Google are actually employing in their algos as opposed to about to employ or maybe will employ at some time is a bit of a roulette wheel.
Your clarification of the often misunderstood concept of LSI was also interesting and certainly employing related terms within copy has to be a helping hand for Google to understand the meaning,intent and quality of your webpage better, which in the end is the ultimate goal of SEO.
Thanks for burying so deep into your patent research, very stimulating!
Thanks for another very helpful and informative page, I am sure that using the related phrases I will be able enhance the content of my blogs.
The quality of your pages and content is something that Google is aiming at using more and more, and they’ve been exploring a lot of different ways to measure it. Their Panda update is one of those, and this kind of phrase-based indexing is another approach that can really help them.
Thank you. Glad to hear that you’re enjoying the series.
I did link to that newsletter with the article from Matt Cutts. It’s in the post using the anchor text How does Google collect and rank results? (pdf), though I’m linking to it again here.
It’s very much likely that Google will look at other signals as well, but phrase-based indexing is interesting because it provides Google with a way to statistically measure whether or not “related” phrases for a specific query (or phrases within a query) appear upon pages, and that can be a pretty helpful signal.
You’re absolutely right that the filing of a specific patent is no guarantee that Google is using a process described within that patent.
What’s interesting here is that Google has (1) filed a good number of related patents that describe a very detailed way of using phrase-based indexing, (2) shown that the approaches in the patent can help some serious problems such as Googlebombing, and (3) provides a benefit even if Google isn’t using phrase-based indexing. By using words that might be seen as “related” to a phrase that a page might be optimized for, you can create richer and more meaningful content for that page.
The goal of something like LSI is to try to understand how words might be semantically related to each other, and that’s true with phrase-based indexing as well. So even if what Google is using is more like phrase-based indexing, the ultimate goal is to create a page that Google will be more likely to index and rank well for the terms that people interested in what a site offers will use to search for and find the site.
Thank you. I’m sure that trying to use meaningful related phrases in your posts will be a good step towards higher quality pages.
That was a great article. I wasnâ€™t thinking along those lines with Panda but the logic makes sense in terms of both anchor text phrase matching and potential spam detection. Good stuff!
Yeah, of course Google do phrase based indexing., when writing content for your website don’t use the keywords you promote alone try to use the related words or phrase such explained above “President of the United States”. For all SEO analyst this will be one of the tips to promote your site.
First two assumption gives clear guideline on what to do for achieving better ranking. Many webmasters discuss and try hard without knowing the actual points or base knowledge of content optimization and first two points elaborate the exact requirement on what is relevant and what is not.
Thank you. Once you first learn about something like Phrase-Based Indexing, you may start seeing phrases everywhere on the Web. I know that I have. 🙂
I agree. Having some idea of what words and phrase might be “related” to the terms that you’ve chosen to optimize a page for can be helpful, and it’s something I consider when doing keyword research as well.
I do think paying attention to the assumptions that I listed behind phrase based indexing can result in better quality content, and can potentially help with ranking as well.
I wonder how this does work in other Languages. Grammar in english is more simple than in others. The order of a sentence in german could bei in more different ways. One specific is not better than others. Also there is an increasing number of words that are allowed to be spelled in different ways. All versions are correct german, but maybe not for Google…
I would suspect that the grammatical rules of English are often as complex as most other languages, and Google has spent years building statistical language models for many different languages looking at many different pages on the Web. I think phrase-based indexing would likely work equally well regardless of the language in question.
HUMAN BRAINS VS GOOGLE ALGORITHM ……nice post
In this particular instance, it looks like Google is trying to learn from human brains – how people write, and which words and phrases they tend to use together on pages when they write about something. 🙂
I just re read this again, (I Need to sleep more) starting to sink in off to experiment, I have already seen some of my keyphrases indexing on phrases I haven’t directly linked on,just because the phrase/s were added to pages they were linked from. As always helping us little jedis keep a track of the death star.
Thank may the force be with You
Hi, first of all thanks for great article.
I am working on SEO in Turkey, during my tests on serps i am seeing different things similar to your document.
For example; http://www.example.com.tr ranks position 3 in hamburger keyword, but when you look into web site you can see that example.com is flashy web site with nearly no-content, after that we look into backlinks too with 3-4 tools and example.com has only 20 backlinks with no anchor text including hamburger.
ps:example = brand name
But when you search intitle:example hamburger in google, there are 1500 results including example hamburger in their titles. So i think google is testing out so much in phrase indexing model.
Am i true to think like this?
This is a great read that really helps explain why exact match anchor text is a bad thing to chase after. If you build content and you get the chance to select the anchor text for the link back to your website, my advice is to stop thinking like an SEO and start thinking more like a journalist, that will keep you from over doing the exact match anchor text and instead give you some healthy phrase based anchor text.
for example if the website/page is about the product ‘iWidget’ and an exact match phrase you would typically build as anchor text is ‘buy iWidget’, then try phrase anchors like “a website where you can buy the iWidget online” or “purchasing the iWidget from an online shop”.
I just came across this article. It totally makes me rethink using exact match anchor text for my backlinks. I’m curious if i should use just the name of the website now. I suppose its still safe to use anchor text only if its completely in context of the page. Definitely going to watch out with my keywords now. Thanks for the heads up.
It’s possible that there might be some level of diminishing returns when the same anchor text is pointed at the same page over and over and over. I don’t think the phrase based indexing patents really tell us that. But they do tell us that if the anchor text is a link is related to the term that a page might be optimized for, it’s possibly going to carry more weight than anchor text that isn’t very related.
I like having that kind of variety in anchor text pointed to a page as well.
Include some variety in your anchor texts. Use terms that are reasonably related to the terms that pages are optimized for. Use some links that use exact match anchor text. The underlying idea behind phrase based indexing is that certain terms and phrases tend to co-occur on pages when a page is actually about something in particular.
Comments are closed.