Imagine that Google assigns categories to every webpage or website that it visits. You can see categories like those for sites in Google’s local search. Now imagine that Google has looked through how frequently certain keywords appear on the pages of those websites, how often those pages rank for certain query terms in search results, and user data associated with those pages.
One of my local supermarkets has a sushi bar, and they may even note that on their website, but the keyword phrase [sushi bar] is more often found upon and associated with documents associated with a category of “Japanese Restaurants” based upon how often that phrase tends to show up on Japanese Restaurant sites, and how frequently Japanese restaurant sites tend to show up in search results for that phrase.
Since Google can make a strong statistical association between the query [sushi bar] and documents that would fall into a category of “Japanese restaurants,” it’s possible that the search engine might boost pages that have been categorized as “Japanese restaurants” in search results on a search for [sushi bar]. My supermarket [sushi bar] page might not get the same boost.
That’s something that a Google patent granted earlier this week tells us.
The patent presents this idea of creating categories for sites and associating keywords with those categories to boost sites in rankings when they are both relevant for those query term and fall within those categories within the content of local search. But the patent tells us that it can use this process in other searches as well.
Keywords associated with document categories
Invented by Tomoyuki Nanno, Michael Riley, and Gaku Ueda
Assigned to Google
US Patent 7,996,393
Granted August 9, 2011
Filed: September 28, 2007
A system extracts a pair that includes a keyword candidate and information associated with a document from multiple documents, and calculates a frequency that the keyword candidate appears in search queries and a frequency that the pair appears in the multiple documents. The system also determines whether the keyword candidate is a keyword for a category based on the calculated frequencies, and associates the keyword with the document if the keyword candidate is the keyword for the category.
If you have access to Google’s Webmaster Tools for a website, the section on “Keywords” shows you the “most common keywords Google found when crawling your site,” along with a warning that those should “reflect the subject matter of your site.” Another section of Webmaster Tools shows the queries that your site receives visitors for, how many impressions and clickthroughs from search results that your pages receive, and an average ranking for your pages in those results. An additional section of the Google tools shows the anchor text most often used to link to your site.
If you were to take all of that information that Google provides for your site, and try to guess at a category or categories that Google might assign for your site, could you? It’s possible that Google is using that kind of information, and more to determine how your site should be categorized. Of course, Google would also be looking at other sites as well for information such as the frequency of keywords used on their pages and queries they are found for to create those categories as well, and to see how well your site might fall into one or more of them.
Of course, if you verify your business in Google Maps, you can enter categories for your business, but Google may suggest and include other categories as well. For instance, Google insists on including “Website Designer” as a category for my site even though that’s not a category that I’ve ever submitted to them.
And it while this patent discusses how it might be applied to local search, it could just as easily be applied to Web search as well, and the patent provides a long list of different types of categories that it might apply to websites that expand well beyond business types.
I’ve written a number of times in the past that one of the benefits of reading through patents are sometimes the questions that they raise more than anything. Here are a few that I’ve come up with after reading this patent:
Can you get an idea of what categories Google might place your site within after looking at information available in places like Google’s Webmaster Tools?
What category or categories might Google think the pages of your website might belong to?
If those aren’t the right categories, what steps can you take to change Google’s classification of your website?
Are you getting a boost in search results for queries that Google might think are associated with those categories?
If you’re doing keyword research, should you try to understand when Google might be associating certain queries with certain categories?
Added August 16, 2011 – 10:29am (edt)
This patent is somewhat quiet on how Google might assign categories to pages, but I’ve written at least a couple of posts before about how Google might classify pages that I think are worth looking at: