What concepts does your website cover?
A search engine might look at phrases that you use on your pages to get an idea of the concepts covered by your site.
The search engine might try to decide that certain phrases you use are the “top phrases” describing topics or concepts about your site.
But what if the search engine is wrong?
What if those top phrases don’t reflect the content of your site accurately? What if some other phrases more meaningfully indicate what your site is about?
If a search engine assigned phrases to your site which might affect the way that your pages are being presented to searchers in responses to queries at the search engine, would you want the search engine to give you the chance to make changes to those phrases that they think your site is about?
New Google Phrase-Based Indexing Patent Filing
A new patent filing from Google describes a way for website owners and site administrators to view the top phrases assigned to their sites by a phrase-based indexing system developed by Google and allows those site owners and administrators to add additional related phrases to help in the indexing of their pages. It’s one of many Google patent filings involving phrase-based indexing.
Most search engines tend to index web pages based upon individual terms found within those pages rather than upon the concepts contained in them. Concepts are often expressed in phrases, and when certain phrases appear together on the same page, they may be able to tell us a fair amount about the topic of that page.
A few years ago, Google published a series of patent filings that explored a phrase-based indexing system looking at how related phrases are used in the content of pages, to index those pages, to understand the topics of pages, to provide personalized search to searchers, to locate duplicate content, and identify webspam.
I’ve written a few earlier posts about this phrase-based indexing:
- Google Phrase Based Indexing Patent Granted
- Phrase Based Information Retrieval and Spam Detection
- Google Aiming at 100 Billion Pages?
- Move over pagerank: Googleâ€™s looking at phrases?
Google may be using this phrase-based indexing system or something very similar to it, but we don’t know that for certain. If they are, it’s possible that at some point in time, they might tell us what they believe are the top phrases for our websites and let us suggest changes to those phrases.
The Google patent application is:
Integrating External Related Phrase Information into a Phrase-based Indexing Information Retrieval System
Invented by Anna L. Patterson
Assigned to Google
US Patent Application 20090070312
Published March 12, 2009
Filed September 7, 2007
An information retrieval system uses phrases to index, retrieve, organize and describe documents, analyze documents, and store the analysis results as phrase data.
Phrases are identified that predict the presence of other phrases in documents. Documents are indexed according to their included phrases. Related phrases and phrase extensions are also identified.
Changes to existing phrase data about a document collection submitted by a user are captured and analyzed, and the existing phrase data is updated to reflect the additional knowledge gained through the analysis.
Top Phrases for a Site
When the phrase-based indexing system explores phrases that show up on your pages, it will also look for related phrases that show up on other pages and other sites that use your phrases. The appearance of phrases and related phrases on your site and other sites is an important part of how this system indexes pages.
In addition to seeing which pages where particular phrases and related phrases occur, the indexing system can also determine a “set of representative or significant phrases for a particular website.”
The “top phrases” for a website might be considered indications “of the queries for which the website is likely to be relevant.”
On a website, the phrase-based indexing system might look at each page on the site to decide upon the top phrases for each page based upon phrase-based indexing methods and then aggregate those top page phrases to determine the top phrases for the page site as a whole.
Phrases on pages that are closer to the site’s root directory (the directory where the home page exists) might be given more weight than pages that are deeper in the page hierarchy of the site.
So, if the phrase “baseball stadiums” is found at a page with the URL “http://www.example.com/baseball-stadiums.htm,” and the phrase “football stadiums” is found on a page with the URL “http://www.example.com/football/football-stadiums.htm,” the phrase “baseball stadiums” might be given a higher score than “football stadiums.”
The patent application covers how a site owner might change the “top phrases” for a site, how Google might look at those suggestions for change, and the impact the changes might have on the phrase-based indexing system.
If Google starts letting us look at the “top phrases” for our pages, and a chance to change those phrases, it might be worth looking more deeply at the patent filing.
What might be helpful for site owners now is to look at the pages of your site and see how well the topics and concepts you want your pages to express are understood by readers and might be indexed by search engines.
For example, if you do have a page that is intended to describe “baseball stadiums,” do other “related” phrases appear on your page that people might expect to see on a page about those stadiums, such as “ballpark,” “bleachers,” “scoreboard,” “big leagues,” “playing field,” “pitcher’s mound,” “home plate,” “infield,” “outfield,” “first base,” “center field,” “dugouts,” and others. What “related” phrases appear on other pages about baseball stadiums?
In a phrase-based indexing system, the use of phrases and “related” phrases on your pages and other sites might determine how well your pages get indexed in the search engine and how well your pages use phrases and related phrases. The “top phrases” for your pages may be the ones that a search engine decides will be most relevant for queries people use to find your pages.
If Google isn’t using a phrase-based indexing system, the use of related phrases might work to expand the number of queries that your pages show up for in search results anyway.
30 thoughts on “What are the Top Phrases for Your Website?”
Could this be a turn around from LSI… instead of the search engine being able to work out the theme of a website by using related phrases it would seem that this would offer the webmaster insight into these phrases. Possibly even a way to tag them. Interesting. The end to keyword stuffing? We can only hope.
I’ve always thought that might be a factor. If you’re optimising a page for the keyword ‘wedding venue’, having words like church, married and even ‘big day’ might be attributing factors. On a site where ‘Wedding Venues’ may be a hard word to incorporate into the text, related phrases could save the day.
There seem to be a number of people who have somehow fixated on Latent Semantic Indexing (LSI) in association with these phrase-based indexing patent filings, and need to put aside that notion. LSI is something else, completely different, and these patent applications have nothing to do with it. If you want to learn about LSI and the myths that many marketers perpetuate about it, check out this series of tutorials:
SVD and LSI Tutorial 1: Understanding SVD and LSI
There’s an introductory section in the first part of that tutorial titled “Search Engine Marketers and their LSI Myths” which is worth reading.
Good points. We don’t know whether or not Google is even using phrase-based indexing, so there might not be any value in using related phrases from that stance. However, using related concepts and phrases within the content of a page makes for a richer, more meaningful experience for a reader, a better written page overall, and the opportunity to perhaps rank for some of those related phrases.
I’ve seen too many posts on keyword research that look at how to use keyword selection tools, or dictionaries, or thesauruses, or tools that will mine pages or search results for related terms, while ignoring the most important sources of information on keywords – the audiences that sites are intended for. Keyword research doesn’t start by sitting down with wordtracker or keyword discovery or other keyword research tools open in front of you.
Keyword research begins by learning about the audience or audiences that a site is intended for, by conversations with the site owners or administrators about their audiences. It involves understanding the products or services or information offered, as well as the objectives behind the publication of the site. It requires learning about the environment that the site exists within on the Web, and within the market that it inhabits if it is engaged in commerce, or the conversations that others are having on the topics that it might cover if its intent is to share information and stimulate conversation.
What I like about the approach described in this patent application is that it might enable site owners to see what a search engine believes that their site is about, and make changes to the actual content of the site itself to reflect what it should be about – concepts that their audience might truly be interested in.
Twitter is clever conversation. Google’s search results return formatted, SEO’d conversations. Twitter’s conversations, according to a comment by Wired on Twitter’s home page, are “… almost like ESP.” ESP. Spontaneity. 140 character phrases in real time. Novelty. What makes Twitter work is that it houses the voices of “novel chatter” — phrase data – a built-in phrase indexing system. Google is attempting to add, among other concepts, “time” and “geography” to search terms (nouns) and educate individuals (web administrators, and others) how to search by a “phrase” so that Google’s SERP’s will return relevant information — as was originally intended by Google’s algorithm — eliminating first-page spam results without any hand intervention by Google. Google realizes that content/web creators are not marketers and are, as you said Bill, “….ignoring the most important sources of information on keywords – the audiences that sites are intended for. Keyword research doesnâ€™t start by sitting down with wordtracker or keyword discovery or other keyword research tools open in front of you.”
Keyword research starts with understanding your customer/audience and then developing a creative idea/campaign from that insight. Remember: creativity and novelty can work just as well, if not better, than related terms.
Isn’t a patent granted based upon the questions, “Is it new, is it novel, does it solve a business problem?” Google’s patent, I believe, is attempting to educate users on the value of “novelty” in our online searches and in our content creation. This schooling has many applications and rewards. Recently I captured a screen shot of Google’s search box for the term “Super Bowl” which returned, highlighted, drop-down results for “2009, winners, history, 2008, 42, 43, tickets,” – etc. “Super Bowl Sunday” and “Super Bowl 2009” are both related, but conceptually can be viewed differently limited only by a creative campaign fueled by developing related search-phrase content. Google knows what is “new”, what is “novel” and how to solve a “business problem” — advertising revenue.
You raise some very interesting points. I’m not sure that we can say that this patent filing by itself is an attempt to reach out to an audience of webmasters, in an attempt to educate them. Google has other channels of communication that might be more effective, such as their many blogs, their webmaster groups, and their help pages.
But, if Google develops the technology behind the patent filing, and makes it available to site owners, showing them the “top phrases” for their web sites, it might just start getting site owners and administrators to think more about the phrases that do appear on their pages, and the phrases that don’t. And it might start getting them to think in terms of how they express the concepts on their pages, through the use of phrases and related phrases.
Creativity, novelty, and innovation in marketing, whether online or offline, are important things to keep in mind when trying reach an audience, and those things can often involve getting people to think about products or services or information in new ways, and with new terms and phrases. That can be challenging when a search engine might rerank search results based upon phrases and related phrases that already exist on other pages on the Web. Especially older pages that have been on the Web for a while.
As you note, twitter provides an insight into conversations, and language that people are using today, and it would be a useful source of information for a phrase-based indexing system. Another source of information on phrases that people are using are the query terms that people type into a search engine.
Something described in a patent (or at least a utility patent) does need to be new and non obvious, but the third part of that usually isn’t described as “solving business problem” as it is “useful.” The US Patent and Trademark Office describes those attributes of a patent here:
It’s possible that knowing what Google might think are the “top phrases” for a site, and exploring phrases that might be “related,” might be helpful in coming up with new and creative ideas for associating different concepts and words to services or goods or information that others aren’t using…
An SEO campaign could involve a number of steps that could improve how visible you might be in search engines. Some of those steps might involve rewriting some of the content on your pages and creating new content and new pages, while others might involve making your site more search engine friendly, conducting keyword research to see if the words that you are choosing to emphasize and optimize your pages for are a relevant and appropriate match for what people who might search for your services will use to search for in search engines, and will expect to see on your pages. Increasing the quality and quantity of links to your site could also play a role in how well your site shows up in search results.
How can SEO and a search engine marketing campaign improve Leadsmarketer website (www.leadsmarketer.com) position in the search engines? Our marketing and sales department invested a lot of resources in writing all the content for our web site but we just canâ€™t seem to be ranking high enough in the engines, while our competition is on top. Do we have to re-write it all over again?
Thanks. I agree. When I read through the patent filing, I was reminded of how Google enables you to look at the site links for your site through Webmaster Central, if there are any, and to block ones listed while providing some feedback as to why you might want those blocked.
Including the “top phrases” for a site in Webmaster Central would be a nice addition.
A negative keyword list isn’t covered in the patent application, but I think it might be an idea that Google should explore as well. Webmaster Central does tell us about some of the queries that our sites are being found for, and I’ve seen some that probably aren’t good matches.
I think the best use of this new feature would be to add a report to Webmaster Central that would show webmasters how Google views their site…in other words, which keywords are the most relevant to your site. This would clearly show webmasters what keywords they need to add to their site or focus more on. It would also show webmasters if there are keywords that Google thinks are relevant, but are not. If the webmaster can create a negative keyword list of sorts that would be great.
Most of this is over my head but, I like the fact that I can get Google to actually help me find the best keyword phrases for my site. For those of you who really understand this I would think that it would be a huge benefit for you.
Am I wrong?
Hi Swing Man,
Close. It wouldn’t help you find the best keyword phrases for your site, but would rather tell you what Google sees are the top phrases for your site now, based upon phrases that you actually use on your pages, and phrases that are related to those phrases.
If you didn’t agree, you could suggest other phrases that you think are better matches for your site.
While you could make those suggestions, you might also consider rewriting the content on your pages, or adding new content that emphasized other phrases that might be more inline with the phrases that you hope Google might see as the top phrases for your site. According to the definition in the patent filing, the “top phrases” for your site are ones that indicate what queries your site might be most relevant for.
This ability to see “top phrases” isn’t available yet, and it might not ever become available. But, it’s possible that Google might make it available to us in the future. If they don’t, it may still be helpful to think about the phrases that you might be targeting on your pages, and whether or not you include other phrases on those pages that might be related to them.
You’re welcome, Rob.
As always thanks for the good information!!! With Regards, Rob
Great article, you raise some very interesting points. I am working on SEO for my company and overlooking about 10 websites right now that need some SEO work and articles like this help me in many ways.
Thank you, Evan.
Happy to hear that my posts have been helpful to you.
I think some black hat SEOs could use this as a tool for gaming the system, but I guess that is nothing new.
Hi People Finder,
I didn’t go into much of the details on how this system might attempt to guard against abuse, but there are aspects of it that do. If the suggested phrases have little or nothing to do with actual content that appears upon the pages of a site, then it’s not too likely that they would be accepted by the search engine.
William Don’t you think that these top phrases won’t do much help to a small webmaster .Because he can’t compete with big players out there . Top phrase means more competition .
The “top phrases” don’t mean more competition, but rather would provide a look, offered by Google, at the phrases that already exist on pages of a site that the site might rank best for under a phrase-based indexing system.
Small businesses often have the ability to react quicker than many large businesses, and the ability to see what Google might think are the top phrases for their sites would give them the opportunity to make changes to the content of their site (and possibly to their business models) that it might take larger businesses more time to implement.
Small business also often have the ability to address needs in a marketplace that larger businesses may not be able to focus upon, because it might take a large business too long to redevelop their product or service lines, or the demand might not be large enough to make responding to such demands very profitable for a large business.
For example, consider two companies, a large and a small one, that sell replica baseball jerseys. The large business has spent a lot of time and effort on creating “authentic” replicas of major league baseball jerseys. The small business has focused more on creating personalized and special event type jerseys. One “top phrase” that the large company may be shown is “authentic baseball jerseys” and a top phrase that the small company may see, or may like to see, for their site is “personalized baseball jerseys.”
The small business might research the marketplace and see that no one is offering older jerseys of baseball uniforms, and see that the phrase “vintage jerseys” gets much more traffic than “personalized baseball jerseys.” Their manufacturing process might be able to be retooled in a couple of weeks to also offer older baseball jerseys, so they do some more research, and decide to move forward. They add some pages to their site offering older baseball jerseys, and allow for those jerseys to be personalized as well. That addition might have taken the larger company months or years to decide upon, and move forward with. There also may not seem to be enough demand for them to sell older jerseys at the many retail shops that they offer jerseys at, so they decide not to move foward. In this area where there might be a smaller amount of demand in the market, the smaller company can make a profit that the larger company can’t.
After the changes are made to the smaller companies web site, they notice that “vintage jerseys” isn’t showing up as a top phrase on their site, even after they suggest the change to Google. They work on their pages some more to make it rank well for that term, and submit the suggestion again. If the changes they make are effective, they should start seeing more visitors show up for that phrase, looking in their analytics programs. If Google starts showing them the phrase as a “top phrase” for their site, it’s more a confirmation of their efforts than anything else.
Thank you for the valuable article. Instead of optimizing generic keywords always we should concentrate more on Long tail business specific keywords where conversion rate will be too high.
You’re welcome, Listorbit.
Ideally, you would optimize your pages for terms and phrases that you can reasonably compete for, that are appropriate to the objectives of your site, and to the goals of your intended visitors.
In some cases, those might be more general and more competitive keyword phrases, but you can also optimize other pages for more specific terms that might be less competitive. Including additional related phrases on pages where you’ve optimized for another phrase might raise up the possibility of increasing the number of long tail keyword phrases that your pages can be found for.
Hi! Im officially inviting you to directly share your views at seo discussion forum talkseo.com. You could try to ask people there to visit your seo-based blog. Because your entries are dead-on user-friendly useful! Thanks!
Thank you for the invitation. I’m not sure that I have a lot of time to participate right now.
Hi, I’m looking for technology that can crawl around my planned civic-oriented social network website and capture, analyze and harvest (“scrape off” is a term someone suggested) key words, phrases, clicks etc. from text, video or audio that our members create on our site, and that can be interpreted or packaged into valuable insights for marketers, elected officials, lobbyists etc.
It sounds like you want a rich mix of data visualization and analytics within one tool that can not only capture text but also audio and video as well. I’m not sure if there is an existing tool that can capture that kind of variety of information, including clicks and more.
There are some interesting tools at Digg Labs that start to do the kind of thing it sounds like you’re interested in
Tree maps are also interesting: http://www.cs.umd.edu/hcil/treemap-history/
One of the difficulties of a site where the content is created by the people who use the site is gaining insights into what those people are doing with the site. Data visualizations can help put all of that into perspective. As for clicks, entrances into the site, site searches, and things like that, an analytics tool might be helpful.
Very interesting discussion on using the words wedding venues to optimise a wedding related website page. We at 5 Star Wedding Directory.com also found it difficult to use the words wedding venues in our on page SEO, but managed to some how come up with something that would be readable as well as good for the search engines. One must use a bit of imagination, and focus on the user first, the search engines after.
Comments are closed.