Google’s Query Based Analysis and Reranking of Search Results

Somewhere in an alternative universe, it’s possible that one of the most feared hitters in baseball might have instead been known as one of its greatest pitchers. Babe Ruth started out as a pitcher for the Boston Red Sox in 1914, and when approached about getting his bat into the lineup on a daily basis in 1918, his manager Ed Barrow responded that “I’d be the laughingstock of baseball if I took the best lefthander in the league and put him in the outfield.” A couple of years later, Ruth was sold to New York’s team for an unprecedented $125,000 where he proceeded to hit 54 home runs for the Yankees, and begin a pretty good career hitting a baseball instead of throwing it at people.

Babe Ruth, well known as a New York Yankee, in a Red Sox uniform.

In 1920, anyone looking for information about the Babe probably weren’t too interested in his pitching career. Likewise, when someone searches today for [world series champion], it’s likely that they are looking for fresh results. How does a search engine like Google determine when searchers might prefer fresh results, and when they might prefer older results?

Google may monitor the results of search queries over time, and adjust the results of those queries based upon certain factors. Google was granted a patent called Document scoring based on query analysis (US Patent 8,051,071) filed on November 22, 2006 and granted on November 1, 2011. The patent describes a number of different types of query-based analysis that Google might undertake to rerank search results based upon this query analysis.

The patent is a continuation patent of Google’s Historical Data patent, which I described last month in my post, 10 Most Important SEO Patents: Part 2 – The Original Historical Data Patent Filing and its Children. Interestingly, Google filed a number of continuation patent applications for this Query Analysis patent last week. The claims sections of those patent applications focus upon specific aspects of the different factors that might influence rankings in search results.

Query Based Factors

Here are the query based factors appearing in the original patent:

Selections Over Time – When a particular page tends to be chosen by searchers over other pages in a set of search results for a query, that page might be bumped in rankings for that query.

Hot Topics – Some search terms may increasingly appear in queries over a period of time, which can indicate that a topic is hot and has gained in popularity. Pages which are associated with those queries may be boosted in search rankings over pages that aren’t.

Related Hot Topics – When “similar” queries become hot and increase in the amount of search results for those queries happens, pages associated with a query that might be similar to that query might be boosted in search results.

Constant Queries with Consistently Changing Results – Some queries remain constant in the amount of searches for them, but the results for those queries tend to change over time, such as a query for “world series champion.” How the results for these queries change may be monitored and used to rank pages accordingly.

Freshness of Documents – How stale or fresh a document is might be can be based upon factors such as a document’s creation date, growth of anchor text, levels of traffic, changes to content, growth or decrease in links to and from the pages, and other factors. For some queries, recent documents may be considered very important. For example, if a search is for a Frequently Asked Questions (FAQ) page, the most recent version might be much more desirable than an older one.

Google might learn which queries recent changes are most important by analyzing which documents in search results tend to be selected by users, and by considering how often users favor a more recent document that is ranked lower than an older document in the search results. If a page tends to be included in results for mostly topical queries such as “World Series Champions rather than more specific queries, such as the name of a team that recently won the world series, then this by itself might be used to lower the ranking for a page that appears to be stale.

Staleness of Documents – For some queries, older pages might be considered better choices than newer ones, and the decisions in those cases might be based upon how often those older pages are selected in search results over newer documents. If searchers tend to select a lower ranked relatively stale document for a given query over a higher ranked, relatively recent document, the stale page might be adjusted to rank higher.

Overly Broad Pages If a page tends to rank fairly well for a range of “discordant” queries, that might be a signal that the page is a “web spam page.”

Newly Filed Continuation Patent Applications

The claims sections of the patent applications from last week go into a little more detail on a number of those factors and add to them. These continuation patent applications were all filed on September 26, 2011 and published on January 19, 2012.

Trends Related to Topics and Search Terms

Document Scoring Based on Query Analysis (US Patent Application 20120016889)

This continuation patent application focuses upon analyzing trends related to search queries to determine groups of topics and search terms that increasingly appear in queries over time, and the frequency with which they appear. Pages that appear within those topics that do include the terms or similar terms that are increasingly being searched for might be boosted in results while pages within those topics that don’t include those terms might not.

What this patent application doesn’t tell us is how some search terms might be considered related or similar to others that are trending upward in popularity.

Access Times to Determine Freshness and Staleness

Document Scoring Based on Query Analysis (US Patent Application 20120016888)

How much time to visitors access specific pages in search results for particular documents? That can be another factor that might be considered when reranking results, in combination with looking at how frequently those pages are selected in search results for a particular query.

If the amount of time that people spend looking at a particular page decreases from one specific time period to another, that might be an indication that the page is stale, and the ranking of that page might be lowered.

If the amount of time people look at specific page increases from a specific time period to another, that might indicate that the document is fresh, and its ranking score might be adjusted upwards.

Relevance of those documents would still play a strong role in how they are ranked.

Frequency of Selection

Document Scoring Based on Query Analysis (US Patent Application 20120016874)

This patent application looks at the frequency of selection of specific pages for specific queries. How frequently a certain page is selected in search results over one period of time might be compared to how frequently that page might be selected in search results over a later period. If the page is selected less, it might be lowered in search results. If it is selected more frequently, it might be increased in rankings.

When Staleness Might be Preferred

Document Scoring Based on Query Analysis (US Patent Application 20120016871)

For some queries, searchers seem to prefer older results that might be perceived as “stale” pursuant to an analysis of user trends regarding that query. Regardless of that perception, if searchers tend to bypass higher ranking fresh looking results and prefer clicking on older documents, then the search engine might rank older documents higher than fresher ones, even if the older documents don’t rank as well from a relevance perspective.

Spam Determination Based Upon Breadth of Rankings, and Authority

Document Scoring Based on Query Analysis (US Patent Application 20120016870)

When a page ranks highly for a wide range of different search queries, then that might be a sign that it is spam, and its rankings may be negatively adjusted.

This patent application provides an exception, which is whether or not that page might be considered an “authoritative document.” An authoritative document might be (1) a government document, (2) a document associated with a web directory, or (3) a document that has maintained at least a threshold rank over time. The historical data patent and the original query analysis patent did mention how Google might make some exceptions for authoritative documents in some parts of the description for those patents related to link based criteria and aspects of ranking history that might be analyzed, it wasn’t mentioned in the query-based factors or in the actual claims of those patents.

Takeaways

If you read through the descriptions of these new patent applications, those descriptions cover a much wider range of topics than the claims associated with them, and are very similar to the descriptions in the Historical Data patent.

There are a lot of interesting things in the descriptions, like the idea that if a page rises too quickly in search rankings over time that it might be demoted somewhat in those rankings. But the important part of these patent applications are the claims sections, which focus much more narrowly on specific aspects of query-based analysis of search results and pages.

The claims in this group of newly published patent applications discuss how how some topics might quickly become popular, and how pages in search results for popular search terms (or similar search terms) within those topics might be boosted in search results for those topics.

They tell us that pages that are being accessed longer and selected more frequently for specific queries from one time period to another might be boosted in results as well. A couple of the other patent applications include the terms freshness and staleness related to increasing frequencies of access time and selections of specific pages, and tell us that in some cases newer and fresher pages might seem to be preferred by searchers, and in other cases older and staler pages might be better results, and either type of result may also be boosted as appropriate.

We’re also told that pages that rank too well for too many queries might be considered web spam unless they are determined to be “authoritative” pages, or government pages, web directories, or pages that have maintained high rankings over a period of time. That “authoritative” designation is one of the things that stands out for me with these patents.

Of course, these are only patents and patent applications, and it’s possible that Google might be doing something somewhat different than what is described within the claims of these documents. But it’s worth spending some time thinking about how Google might be analyzing search results for queries performed at their search engine, and how it might influence what searchers see when they perform a search.

Share

30 thoughts on “Google’s Query Based Analysis and Reranking of Search Results”

  1. I like the idea that sites that promote “fresh” and “new” content on a consistant basis will be looked at favorably. I also find the time on site to be an important factor in all of this. It just goes back to the basic format of providing fresh and new “original” content on a regular basis and your rankings will follow becuase of this.

  2. “When a page ranks highly for a wide range of different search queries, then that might be a sign that it is spam”.
    Hmm. Does it mean that a new page that provides a wide comprehensive analysis of a topic [and is not treated as "authoritative" because it is new], may be downgraded in search results?

  3. Hey Bill,

    Before I was even half way down the page, I knew that you would be mentioning behavioral metrics such as “access time” and “clicks per 100 queries” aka “frequency of selection”.

    It just make too much sense to incorporate this data into a frequent reranking of the top ten or twenty.

    So many SEOs preach that backinks are the key to ranking, but if 9 times out of 10 organic listing number two gets selected over organic listing number one, it only makes sense that Google would make note of that behavior and adjust accordingly.

    I have no doubt that social metrics (ones not as obvious as the +1 button) will be used here-forth to rerank the results of search terms.

    Excellent, excellent, excellent.

    Mark

  4. Great point Mark! I think social metrics have already started to show their impact. I agree not fully but in their own ways. When we press search, we get the ‘social’ results first. The remaining results come later. I am a bestselling author and I have realized the importance of social metrics. I also devote quality time to connect using these platforms!

  5. Hi Bill,

    Yes, Definitively it’s really great post thanks.
    And as I think “Selection over the time” and “Freshness of the content” are more valuable factor to get ranked highest in search result as compare to other mentioned points over there.

    And of-course only Freshness of the content is the only thing we webmaster can concentrate & optimize to full fill visitors requirement.

    pls let me know your feedback on the same.

  6. Wow, a lot to take in in one sitting but a good post nonetheless. I agree with regards to fresh content – every website I work on I tend to set-up a blog to sit alongside it as it makes the site more engaging and allows for a little more social sharing. All important factors… especially with all the recent changes to Google’s algorithm.

  7. Yes I think for best results, back-links are on the way out. I know of a site with only 8 back links that is higher than a site with hundreds (and fairly good and legit). The key difference seems to be only that the higher site regularly produces fresh content.

  8. I’m a bit worried about the patent that states ‘authoritative pages’ are considered less spammy if they rank well on a broad spectrum of keywords, than a non authoritative pages . This means that a newly launched website with a varying amount of keywords will not rank sidewide if it not has any authority. Certain aspects are already built-in by Google. Looking forward to read more about that.

  9. I think with the changes that have take place with google in the last year, this explains a lot. The presence of google+ will evolve and it will be interesting to see how everything else evolves around it.

  10. Hi Bill,

    I’ve seen a lot of authoritative and relatively informative articles that attempt to analyze Google’s search algorithm, but this is the first which predominately uses patent applications to infer information. What an awesome method! Thank you for outlining the patent facts and your hypotheses in such a coherent and accessible manner.

    Grace

  11. I have been experiencing a serious drop in the serp for single words the last couple of weeks and I’m suspecting it has to do with the algorithm update regarding document freshness. This blog post was an interesting read as it brings up what Google is all about – a search engine for real, human searchers. It proves that you have to think like a seo expert AND a human and combine those in order to succeed with your online business.

  12. Hi Sam,

    Actually, for some queries, older content might actually be favored. For example, someone searching for information about [Windows 95] might be much more interested in older staler results than fresher pages.

  13. Hi Lazar,

    Not sure that many new pages are going to be able to rank well for a wide range of different queries as fairly new pages. But it’s possible. For instance, a site that might be set up as a comprehensive look at a natural disaster and relief efforts associated with it might have pages that fit that profile – being able to rank very highly for a wide range of terms.

    I suspect that kind of recency and topicality might be able to trump some of the search engines concerns. As a page focusing on a “Hot Topic,” it might be boosted in search results for those queries, even though it might rank for a breadth of terms without being an “authority” page.

  14. Hi Mark,

    When you’re talking about query analysis, I just don’t think you can get away from those user behavior type metrics.

    The amount of signals that Google is looking at seem to grow and grow, and while there may always be a place for some kind of link analysis (some of the historical data patents also include looking at links and anchor text over time), I do think watching how people interact with search results has a role in future rankings.

    Some queries are seasonal as well, and using this type of query analysis over time might help there as well. It’s quite possible that less people might be looking for ecommerce sites when searching for something like [christmas] in March than in November, for example. And the results that people click upon during those months an pages that they tend to stay upon may differ, and the preferences behind those behaviors may have little to do with differences in PageRank or different information retrieval scores based on relevance.

  15. Hi Daniel,

    I think there’s a place and role for social results, but not for every query. People do also search without being logged into a Google Account as well. Those social results are definitely something we do have to keep in mind though, and consider how they might impact searches.

    I’d love to see whatever analysis people at Google might have done at this point on the impact of social results upon searches.

  16. Hi Rajesh,

    I’d love to see some hard data to compare these different types of factors. I think the “Hot topics” factor could sometimes have quite an influence as well. A topic doesn’t necessarily have to be a fresh one. For instance, some search topics and terms become hot because of seasonal events (superbowl, holidays, etc.).

  17. Hi Jim,

    It was quite an undertaking to write in one sitting as well. I started the post about three different times before I finally settled in.:)

    A blog is something that can definitely help when it comes to adding fresh content to a site, and also when writing about hot topics, especially if you publish frequently and add something new and unique to those topics.

  18. Hi scarfmedia,

    Of course we know that not all links are made equally, and that a single link from a very high quality (high PageRank) page can have much more value than links from hundreds of lower quality pages (if not even more). The fresher content may be making a difference, but it’s hard to tell if we’re just looking at the numbers of backlinks and not the quality of those links.

  19. Hi Jan-Willem,

    In the instance of a newly launched website, in most instances it’s not going to rank highly for a broad range of keywords out of the gate anyway.

    There are exceptions to that, and they might rely upon some of the other factors, for instance. One example I can think of is a blog someone set up in response to a Tsunami back in 2005 that attracted a lot of links from high authority sites like the Red Cross, Unicef, and other pages that help spearhead relief efforts, and a lot of attention in other places as well. Chances are that it was clicked upon by a lot of people when it appeared in search results as well, and people spent a fair amount of time on the pages of the site.

    It may just be possible that some of these factors may balance others out.

  20. Hi Grace,

    Thanks. I like looking at patents (and whitepapers) from the search engines because they are from the search engines. As primary resources created primarily to protect intellectual property, I think they can provide us with some interesting insights into how search engines feel about search. It’s possible that Google might not be doing all of the things described in these patents, but we can learn a lot about the assumptions that they make anyway.

  21. Hi Erica,

    The drop off you’re experiencing might be in part due to freshness, especially if the terms involved are ones in a topic or area that have been seeing a lot of new pages and information on the Web for those terms.

    Doing SEO means thinking what might appeal to people, and understanding the impacts of some of the limitations of computers. A human being probably stands a much better chance of looking through a list of 100 search results for a query and identifying the pages that are fresher and the pages that are older than a computer does. That may be why Google would rely (at least in part) upon clicks from people to determine which results are fresher.

  22. Wow, a lot to take in in one sitting but a good post nonetheless. I agree with regards to fresh content – every website I work on I tend to set-up a blog to sit alongside it as it makes the site more engaging and allows for a little more social sharing. All important factors… especially with all the recent changes to Google’s algorithm.

  23. Hi Lifeofdogs,

    Thank you. If a blog focuses upon recent and timely events, products, services, and ideas, then there is a good chance that when finding fresh content is the focus of searchers for those topics, you can see some benefit from the freshness factor described in a number of those patent filings.

  24. Bill,

    Thank you for another great article.

    I have been trying to figure out the best way to use “anytime….past 24 hours” to the best advantage since it showed up on the search results page. I am going to need to read and re-read this, and your related “patent” entries.

    I’ve read several other blog entries/SEO forum conversations that all seem to go down the road of the obvious or the speculative. Trying to understand information gleaned from patents seems to me the best way to start the conversations.

    Peace

  25. Hi Joe,

    Thank you.

    I do spend a little time everyday looking at the different date ranges and comparing the results for a few queries. It’s interesting how those change from one to another.

    The thing I like most about patents is that they do give an insight into search from the perspective of search engineers, and I don’t think at all that it hurts to pay attention to that.

  26. I love how Google is showing fresh results and news results. For instance, I heard a rumor that Blackberry 10 would be launched in January so I tried to Google it and the news feed and blogs that were written yesterday showed up. Normally, it’s old pages that haven’t been updated for awhile.

Comments are closed.