How Does Google Rank Blogs?
Interested in finding a blog about a particular topic or place, rather than individual blog posts? A recent change with Google’s blog search is intended to make it easier to do so. The timing of the change interestingly corresponds to the granting of a Google patent on how Google may index and retrieve blogs last month. The announcement of the change was noted by Google on one of their blogs:
Recently, our blog search team made it much easier to find full blogs about your query, rather than single posts on the topic. This is especially useful if you’re looking for bloggers that post on an ongoing basis about the subject of your query.
— This week in search 8/27/10, The Official Google Blog.
The Google patent describes how the search engine might rank blogs, by collecting information from a blog’s feed and from the blog itself to attempt to understand what a blog might be about. It may collect information such as the content of posts, post titles, post authors, blog authors, the blog’s title, profile information about the authors, blog roll information, and possibly other information about the blog.
Rather than focusing upon ranking individual blog posts for a particular query, the focus of this patent appears to be upon determining whether a blog itself should be ranked for that query.
I’m not sure how big a change this is to what Google has been doing in the past, but the change that is indicated in the Official Google Blog does provide some new information about blog rankings that we haven’t had access to before.
How new is this change and why does it matter?
If you start a search at Google’s home page for the term “security,” and then choose “blogs” in the left sidebar, you will see three blogs listed at the top of the blog search results under a link that says “Related blogs about security.” You will also see three blogs listed at the top of the results on security, and then a line separating those links from results that appear to be a ranking of individual blog posts about security.
When you arrive at the Google Blog search results after starting your search from the Google home page, Google also includes a link that shows up in the sidebar after you click upon the “More Tools” section that lets you choose between “posts” and “home pages.” If you compare the sites that show up when you click on the “related blogs about security” with the “home pages” results, they appear to be the same list. Visiting either way gets you to a set of search results with the heading of “Homepages.”
On the other hand…
If you start a search for “security” at Google’s Blog Search home page, the results are different. You don’t see a link at the top for “Related Blogs about security,” but you do see an indented list of five blogs at the top of the results after a text label of “Related Blogs:” Those results do include the same top three blogs listed when you arrive at those results starting from Google’s web search. But, the last two results shown in this list of “related blogs” are different.
I’ve tried a few other searches for some additional queries, and the results do seem to be different, though with some overlap of results, when you start at Google’s home page compared to starting at Google Blog Search.
Why is there a difference?
Google Blog Search has been showing “related blogs” at the tops of blog searches for more than a couple of years. Are those related blogs being ranked differently than the homepage blog results we see when we arrive at Google Blog Search after starting from Google’s home page? The differences in the results showing presently indicate that they might be. It’s possible that the changes that we see when starting at Google’s home page might be mirrored in future results when starting at Google’s Blog Search at some point in the future.
It’s possible that Google may have changed more than an interface. The algorithm used to determine related blogs may have changed somewhat as well.
One nice thing about this change is that we can now see a possibly much longer list of “related blogs” or “home pages” on topics that we search for, and get a sense of how close a blog home page might be to ranking to show up as one of the top three homepages at the top of those search results.
About the Patent
The Google patent is:
Indexing and retrieval of blogs
Inventors: Alex Khesin, Andriy Bihun, Eduardo Morales, Jason Goldman, Jeff Reynar, and Vinod Marur
Assigned to Google Inc.
US Patent 7,765,209
Granted July 27, 2010
Filed: September 13, 2005
A system may receive a feed associated with a blog. The system may extract information from the feed and the blog and create a hybrid document based on the extracted information. The system may further use the hybrid document to determine a relevance of the blog to a search query.
The patent itself is interesting for a number of reasons beyond its apparent focus on helping to find blog home pages. It was filed on the same day as the patent application that I wrote about three years ago in Positive and Negative Quality Ranking Factors from Google’s Blog Search (Patent Application). It also shares the same list of inventors, so these two patents are very much related to one another.
There isn’t a lot of overlap in the published descriptions of the patents however. As I mentioned above, this patent appears to focus on helping to find blogs rather than blog posts.
What’s the difference?
Say you wanted to find a blog from someone who writes about life in Virginia rather than someone who wrote a blog post about a trip to Virginia. The approach in this patent is intended to help you find the blogger writing about life in Virginia. The other patent filing is intended to help you find the blog post about the person taking a trip to Virginia.
This patent, for instance, might look at the profiles of the bloggers and notice that they live in Virginia, where the other patent wouldn’t. It might collect other profile information to use in indexing as well, including age, gender, etc.
It might also look at other types of information to create a blog homepage rank, including content from the blogs’ feed and from the blog itself. These can include authors’ names, post titles and content.
It may also try to compare what is found in a blog feed and on the blog itself to determine whether or not a blog is legitimate, or might be spam. As we’re told in the patent:
For example, if the post content extracted from the blog feed does not match the post content extracted from the blog/post documents, this may be an indication that the feed and/or documents are not legitimate. That is, an individual may be attempting to spam a search engine into ranking that individual’s blog/post more highly than it would ordinarily be ranked. In this situation, no hybrid document may be formed for this blog/blog post, or be given a very low rating.
The patent describes how it may create a “hybrid document” about a blog out of information from both XML feeds, blog posts, and pages linked to from those feeds and posts such as profile pages.
But it doesn’t return those hybrid pages in response to a searchers query. Instead it may return a link to the blog’s home page, or possibly even to a blog post based in part upon the information found on the hybrid pages.
It’s possible that Google has been using a process like the one described in this patent ever since they started showing “related blogs” in Google’s blog search.
The new “posts and Homepages choices” in Google’s search interface that is available when you start your search on the Google home page, and then switch to Blog Search makes it more visible, and allows you to see more than just the first result.
It’s possible that the different results that we see when starting at Google’s home page for “related blogs” are calculated somewhat differently than the “related blogs” we see when starting at Google Blog Search.
This newly granted patent may give us some clues as what Google might be looking at in both approaches, such as the possibility of Google looking at associated profile pages to learn more about the bloggers associated with blogs.