Somewhere in an alternative universe, it’s possible that one of the most feared hitters in baseball might have instead been known as one of its greatest pitchers. Babe Ruth started out as a pitcher for the Boston Red Sox in 1914, and when approached about getting his bat into the lineup on a daily basis in 1918, his manager Ed Barrow responded that “I’d be the laughingstock of baseball if I took the best lefthander in the league and put him in the outfield.” A couple of years later, Ruth was sold to New York’s team for an unprecedented $125,000 where he proceeded to hit 54 home runs for the Yankees, and begin a pretty good career hitting a baseball instead of throwing it at people.
In 1920, anyone looking for information about the Babe probably weren’t too interested in his pitching career. Likewise, when someone searches today for [world series champion], it’s likely that they are looking for fresh results. How does a search engine like Google determine when searchers might prefer fresh results, and when they might prefer older results?
Google may monitor the results of search queries over time, and adjust the results of those queries based upon certain factors. Google was granted a patent called Document scoring based on query analysis (US Patent 8,051,071) filed on November 22, 2006 and granted on November 1, 2011. The patent describes a number of different types of query-based analysis that Google might undertake to rerank search results based upon this query analysis.
The patent is a continuation patent of Google’s Historical Data patent, which I described last month in my post, 10 Most Important SEO Patents: Part 2 – The Original Historical Data Patent Filing and its Children. Interestingly, Google filed a number of continuation patent applications for this Query Analysis patent last week. The claims sections of those patent applications focus upon specific aspects of the different factors that might influence rankings in search results.
Query Analysis Factors
Here are the query analysis factors appearing in the original patent:
Selections Over Time – When a particular page tends to be chosen by searchers over other pages in a set of search results for a query, that page might be bumped in rankings for that query.
Hot Topics – Some search terms may increasingly appear in queries over a period of time, which can indicate that a topic is hot and has gained in popularity. Pages which are associated with those queries may be boosted in search rankings over pages that aren’t.
Related Hot Topics – When “similar” queries become hot and increase in the number of search results for those queries happens, pages associated with a query that might be similar to that query might be boosted in search results.
Constant Queries with Consistently Changing Results – Some queries remain constant in the number of searches for them, but the results for those queries tend to change over time, such as a query for “world series champion.” How the results for these queries change may be monitored and used to rank pages accordingly.
Freshness of Documents – How stale or fresh a document is might be can be based upon factors such as a document’s creation date, growth of anchor text, levels of traffic, changes to content, growth or decrease in links to and from the pages, and other factors. For some queries, recent documents may be considered very important. For example, if a search is for a Frequently Asked Questions (FAQ) page, the most recent version might be much more desirable than an older one.
Google might learn which queries recent changes are most important by analyzing which documents in search results tend to be selected by users, and by considering how often users favor a more recent document that is ranked lower than an older document in the search results. If a page tends to be included in results for mostly topical queries such as “World Series Champions rather than more specific queries, such as the name of a team that recently won the world series, then this by itself might be used to lower the ranking for a page that appears to be stale.
Staleness of Documents – For some queries, older pages might be considered better choices than newer ones, and the decisions in those cases might be based upon how often those older pages are selected in search results over newer documents. If searchers tend to select a lower ranked relatively stale document for a given query over a higher ranked, relatively recent document, the stale page might be adjusted to rank higher.
Overly Broad Pages If a page tends to rank fairly well for a range of “discordant” queries, that might be a signal that the page is a “web spam page.”
Newly Filed Continuation Patent Applications
The claims sections of the patent applications from last week go into a little more detail on a number of those factors and add to them. These continuation patent applications were all filed on September 26, 2011, and published on January 19, 2012.
Trends Related to Topics and Search Terms
Document Scoring Based on Query Analysis (US Patent Application 20120016889)
This continuation patent application focuses upon analyzing trends related to search queries to determine groups of topics and search terms that increasingly appear in queries over time, and the frequency with which they appear. Pages that appear within those topics that do include the terms or similar terms that are increasingly being searched for might be boosted in results while pages within those topics that don’t include those terms might not.
What this patent application doesn’t tell us is how some search terms might be considered related or similar to others that are trending upward in popularity.
Access Times to Determine Freshness and Staleness
Document Scoring Based on Query Analysis (US Patent Application 20120016888)
How much time to visitors access specific pages in search results for particular documents? That can be another factor that might be considered when reranking results, in combination with looking at how frequently those pages are selected in search results for a particular query.
If the amount of time that people spend looking at a particular page decreases from one specific time period to another, that might be an indication that the page is stale, and the ranking of that page might be lowered.
If the amount of time people look at specific page increases from a specific time period to another, that might indicate that the document is fresh, and its ranking score might be adjusted upwards.
Relevance of those documents would still play a strong role in how they are ranked.
Frequency of Selection
Document Scoring Based on Query Analysis (US Patent Application 20120016874)
This patent application looks at the frequency of selection of specific pages for specific queries. How frequently a certain page is selected in search results over one period of time might be compared to how frequently that page might be selected in search results over a later period. If the page is selected less, it might be lowered in search results. If it is selected more frequently, it might be increased in rankings.
When Staleness Might be Preferred
Document Scoring Based on Query Analysis (US Patent Application 20120016871)
For some queries, searchers seem to prefer older results that might be perceived as “stale” pursuant to an analysis of user trends regarding that query. Regardless of that perception, if searchers tend to bypass higher ranking fresh looking results and prefer clicking on older documents, then the search engine might rank older documents higher than fresher ones, even if the older documents don’t rank as well from a relevance perspective.
Spam Determination Based Upon Breadth of Rankings, and Authority
Document Scoring Based on Query Analysis (US Patent Application 20120016870)
When a page ranks highly for a wide range of different search queries, then that might be a sign that it is spam, and its rankings may be negatively adjusted.
This patent application provides an exception, which is whether or not that page might be considered an “authoritative document.” An authoritative document might be (1) a government document, (2) a document associated with a web directory, or (3) a document that has maintained at least a threshold rank over time. The historical data patent and the original query analysis patent did mention how Google might make some exceptions for authoritative documents in some parts of the description for those patents related to link based criteria and aspects of ranking history that might be analyzed, it wasn’t mentioned in the query-based factors or in the actual claims of those patents.
If you read through the descriptions of these new patent applications, those descriptions cover a much wider range of topics than the claims associated with them, and are very similar to the descriptions in the Historical Data patent.
There are a lot of interesting things in the descriptions, like the idea that if a page rises too quickly in search rankings over time that it might be demoted somewhat in those rankings. But the important part of these patent applications are the claims sections, which focus much more narrowly on specific aspects of query-based analysis of search results and pages.
The claims in this group of newly published patent applications discuss how how some topics might quickly become popular, and how pages in search results for popular search terms (or similar search terms) within those topics might be boosted in search results for those topics.
They tell us that pages that are being accessed longer and selected more frequently for specific queries from one time period to another might be boosted in results as well. A couple of the other patent applications include the terms freshness and staleness related to increasing frequencies of access time and selections of specific pages, and tell us that in some cases newer and fresher pages might seem to be preferred by searchers, and in other cases older and staler pages might be better results, and either type of result may also be boosted as appropriate.
We’re also told that pages that rank too well for too many queries might be considered web spam unless they are determined to be “authoritative” pages, or government pages, web directories, or pages that have maintained high rankings over a period of time. That “authoritative” designation is one of the things that stands out for me with these patents.
Of course, these are only patents and patent applications, and it’s possible that Google might be doing something somewhat different than what is described within the claims of these documents. But it’s worth spending some time thinking about how Google might be analyzing search results for queries performed at their search engine, and how it might influence what searchers see when they perform a search.