What is query breadth, and What Does it Mean for Search Results Rankings?
Do the keywords in page titles on a Google search for [Digital Camera] carry more weight than the keywords in titles on a search for [Canon Rebel Digital Camera]? It’s a possibility.
How much weight do the keywords in your page title carry in search rankings? Or anchor text in a link pointing to that page? If I told you that the weight carried by different ranking signals could vary based upon many circumstances, that might be a little frustrating. Unless I could point out a good example of how a search engine might devalue the impact of some ranking signals, and in doing so, boost other ones.
A Google patent granted today explains how something called “query breadth” can influence the weight of popularity-based ranking signals, and in doing so, alter how much weight relevance-based signals might carry.
Query Breadth can mean the number of results returned by the search query, the length of the query, the IR score drop-off, or some other measure of breadth.
Ranking Signals and Search Engines
When someone searches at a conventional search engine, they see a specific order based on several ranking signals. Google has told us that they consider at least 200 different signals. Those can involve such things as whether the terms used in a search query appear on pages. Or the quality of links pointing to those pages or many others.
Some of these signals could use information retrieval or relevancy signals because the scores for pages depend upon the words used in a query. For instance, a more highly relevant page might use that phrase to search for [Canon Rebel Digital Camera]. A more highly relevant page might use that phrase. The words in that phrase appearing many times on the page itself, and it may have those words or very related words appear as anchor text in links pointing to the page.
Other signals might be important or popular, such as the quality and quality of links pointing to a page. For instance, a link from the New York Times front page might carry more weight and provide a greater popularity score than a link from the front page of my local bi-weekly paper, the Fauquier Times-Democrat.
Still, other ranking signals might combine both relevancy and popularity. This can be the number of times a particular page displays to searchers in response to a particular query. It might also look at the number of times that page gets clicked on when shown to a searcher after searching for a certain term.
How Broad Queries Impact Some Popularity Measures
Search engines often return many pages when someone searches for more common one or two-word phrases, such as [digital camera]. These can often be in the millions. A vast number of people tend to search for more common phrases like that one as well. The chances are that people performing that search may click a lot on links to web pages that show up in the top results for that search. If the number of impressions or clicks for those results is a popularity score, those top pages may be over-represented in popularity counts.
That kind of over-representation of user-behavior data may make it so that other results for broad queries may have considerably lower popularity scores and have difficulties outranking the more popular pages. That is even if the owners of those pages make changes to make those pages more relevant for those particular queries.
Because of that, when a query is a broad one, a search engine might not give as much weight to popularity-based ranking signals. Which would mean that relevance signals would, in turn, have more weight?
The Google patent is:
Methods and systems for adjusting a scoring measure based on query breadth
Invented by Karl Pfleger and Brian Larson
Assigned to Google
US Patent 7,925,657
Granted April 12, 2011
Filed: March 17, 2004
Methods and systems for adjusting a scoring measure of a search result based at least in part on the breadth of a previously executed search query associated with the search result are described.
In one described system, a search engine determines a popularity measure for a search result. It then adjusts the popularity measure based at least partly on a query breadth measure of a previously executed search query associated with the search result.
The search engine may use a variety of query breadth measures. For example, the search engine may use the number of results returned by the search query, the length of the query, the IR score drop-off, or some other measure of breadth.
Different Ways a Search Engine Might Measure Query Breadth
There are a few different ways that a search engine might measure query breadth, and it may consider these in combination with each other:
- The higher the number of results returned by the search query, the broader the query
- The drop off in information retrieval (or relevance) scores from one result to another, for example, how much of a drop off there might be from the IR score of the first result to the tenth result, or the hundredth result, or so on. If there isn’t much of a drop off in those scores as you drill deeper and deeper into the results, then the query can be fairly broad.
- How frequently people search for a particular query or very similar queries – the more frequently, the more broad the query could be
- The smaller the number of terms in a query, the broader it might be
Some queries might be short or searched frequently but still be very narrow queries, so a combination of measures like those listed above.
Query Breadth Take Aways
We don’t know how much weight Google might be giving to popularity-based user-behavior measures such as impressions and click-throughs in search results. They may be giving some weight to those.
We also don’t know much of the rankings between web pages that appear in search results for particular queries. When someone searches, the first result might be considerably more relevant for the query used than the second result. Or it might be a photo finish between the two. We also don’t know how closely the first search result and the 10th result or the 100th result might be in terms of an information retrieval score.
We know that it’s often more likely that the higher ranking results will get seen and clicked on more often. They may increasingly grow in terms of the popularity-based aspect of ranking pages. That can potentially keep pages that increase in a relevance-based part of their ranking scores from outranking pages that continue to grow in popularity.
How Broad Might a Query Be?
This patent provides many ways to counterbalance that growing popularity. This patent provides many ways to see how broad a query might be. Also, for sufficiently broad queries, Google may devalue the impact of popularity scores such as clicks and impressions.
This is just one example where the actual weights of different ranking signals may vary based upon a specific circumstance, such as query breadth. Chances are there are many others as well.
So, when someone asks you how much weight a title element carries when it comes to search rankings, and you answer, “It depends,” and they ask you for an example, query breadth is one that you can now point to as a possibility.
36 thoughts on “Time to Add Query Breadth to Your SEO Glossary?”
So basically its adjusting the inflation of the high CTR and impression and popularity from sites that getting it from broad queries threw a query breadth analyzer ?
This reminds me of Google’s reasonable searcher patent, which discusses ways that links on a page might be weighted on the basis of how likely they are to receive clicks. It sounds like Google has been seeking ways to temper crude algorithmic assessments for years — something we have always suspected but for which there has been relatively little public proof (excluding numerous vague comments from Google employees).
So if a 3 word search query has several results and they have good IR score and Page Rank, what else be the other factors that help you to get higher rankings? I think bill is right on CTR and impressions what else can be?
If you would be so kind as to lend your opinion to something I have been wondering, and theorising about:
Do you think it is likely that Google varies its algorithm by topic, giving niches different ranking factors based on audience?
One of the most interesting things about this patent isn’t the patent itself – it’s how it hints at the ways that Google is using user-data, despite being exceptionally coy about this for a long time. That this patent is about ensuring that user-signals don’t have excessive influence over other signals demonstrates just how important a part of the algorithm they probably are.
And don’t forget that this patent was filed in 2004 (when I first got into SEO). Back then, Google would pretty much flat out deny that they were using any user data in their algorithms.
New information for me but I believe the key is still relevancy.
We’re given examples of signals based upon popularity like CTR and impressions in the patent description, but it’s possible that other signals might be viewed as well that to a degree rely upon popularity.
The new +1 buttons in search results for example, which we were told may now be worked into impacting search results (see: High-quality sites algorithm goes global, incorporates user feedback) is one of the latest popularity-based user behavior signals that we’ve been alerted to by Google. Are those being tempered by signals like query breadth?
Good point. I was reminded of the Reasonable Surfer patent as well.
We have been told by Matt Cutts and others from Google that often the ways that certain algorithms are implemented are more sophisticated than they may seem on the surface, and a lot of patent filings that described different signals include discussions of baselines and thresholds and alternative implementations that may help ranking signals involved be less prone to manipulation.
It is nice to see at least one approach spelled out explicitly that tells us how the search engine may be going about things to avoid potential problems with using a certain ranking signal. As the inventors tell us in the patent:
* My Emphasis
Chances are that a lot of the standard ranking signals that we are aware of are still used in ranking pages, such as the relevancy of a page for terms used in a query, the link popularity of the page itself, and others. But, on top of those are a lot of different filters that Google may use to rerank search results such as country preferences, language preferences, customization based upon a previous search, personalization based upon individual and aggregated user-behavior signals, quality-type signals, and more.
The patent points out CTR and impressions as some potential user-behavior signals, but this patent was originally filed in 2004, and we don’t know to what degree those may have been used in helping to rank pages. It is quite possible that approaches like query breadth have been used in limiting the impact of some popularity-based signals though, even signals other than CTR and impressions.
There are a great number of ways that Google may classify pages and other types of documents on the Web, and use different algorithms to rank those differently.
For example, different genres of pages most certainly follow different algorithms. Blog posts and news articles may see freshness playing a larger role in how they rank than product type pages, for instance. Some pages may rank differently than others based upon how well of a match they are for different types of searcher intent, such as navigational, transactional, and informational type pages. The signals on some pages may be looked at differently if there is potentially some kind of geographic relevance to them, and they match a geographic intent.
There have been a number of white papers and patents from Google that describe how they may create profiles for people (individuals and as members of different groups for certain interests), for queries (to understand the diversity of potential results, to understand different possible meanings in different contexts, and more), and for websites (for example, some domains are the “perfect” results for some navigational queries).
One of the things that Google likely introduced during the “Big Daddy” infrastructure upgrade a few years back was the ability to plugin different modules of their ranking algorithms, where they could test and try out multiple simultaneous ranking signals for different classifications of sites based upon things like topic and type of query intent. This module approach made it easier to turn on one collection of algorithms and turn off another with the pressing of a single button.
The following post that I wrote back in 2006 about Microsoft adopting a system like that is probably similar in a number of ways to what Google is probably doing:
Microsoft Patents Dynamic Ranking Changes
Good points. There have been a good number of patents and whitepapers from Google that describe how they could be using user-behavior signals when ranking pages. What I thought was interesting about this one is that it may be one of the first that I can recall that specifically explores how to temper those types of signals and reign them in to avoid having them hold too much sway over search results.
Google may or may not be using query breadth, but chances are that they’ve explored other approaches that might mitigate an over excessive impact of some signals, and chances are that different ranking signals may carry different amounts of weight based upon a number of things, including a classification of a page into category, or a matching of the intent of a searcher, or other features of pages, of search results.
Another signal that I recall may influence the rankings of a page could be the average age of results that appear for a query. If a top certain number of results tend to be older, there may potentially be a boost for older results. If a top certain number of results tend to be newer, then there may be a boost for fresher results.
No doubt that relevancy is still important, but how it’s weighed and used with other signals may vary based upon things like query breadth.
thank you Bill for the follow up
Well, I am still on the begin of the SEO work. I have some doubts if I can follow the constant changes: this post information, for instance.
Your blog is been very helpfull to me!
ps. Sorry for my English, I am from Brasil and I speak Portuguese.
So when I look keywords for my site, I have not only to verify exact queries but and broad queries?
You’re welcome, Wissam.
Thank you for your awesome reply 🙂
is it a patent or a red herring?
Patents usually require that companies lodge specific, highly rigorous explanations of what a product or process does.
Companies try to make the patent as wide as possible so that it prohibits anyone using anything similar, forever.
So why would google give up its special sauce? If you are the leader, and your power rests in the secrets that keep your search engine at the top? What is the benefit in telling everyone else what you are about to do in your micro changes? Is it to impress investors, searchers or help seo people to have a better chance of rigging the system?
Do you see what I mean? Everyone I know in seo complains about how Google changes things without warning and that the secret technology makes it hard to game the system, yet here there are, spelling out a patent? Something that they could surely hide inside their code that would be impossible to reverse engineer, unless stolen.
Am I missing something? I mean from the abstract the actual code of what they do still remains vague and hidden, so is this any more than PR or will it make search results better? Bill?
So, if I understood correct, and if we connect that to users experience, example situation could be like this:
One searcher makes 2 searches, one by one.
1st search: “London, England, Europe”
and 2nd “SEO services”
on second, page with title “London England Europe internet service” could get better position then page with title “SEO services Belgrade Serbia”
Am I on the right track?
This is getting a bit too complex…Question what would happen if you remove the code for google analytics from your website? How could then Google use your data and will be able to determine your ranking?
Also i do not think google should be taking into account the +1 in SERP. If they do this will be hugely manipulated! Next thing you know it someone will be offering on the new +1 service. All you need is 1000 different IP and you are on your way!
I agree. It adjusts the inflation of the high CTR. Thanks for the share.
@John – I think Google really likes the data from GA but I am sure that it is able to monitor all of the 13 major global routers and work out traffic, IP addresses and individual computers anyway. I think that GA is just to make us feel safe and warm and hook us further into the Google machine (adwords).
Bill, thanks for your feedback. It’s nice to browse your blog, many knowledge can be learned.
I would certainly agree that a “direct hit” – that is using less than the 60 or 70 characters allotted on the title tag for the exact match gets more consideration, in my experience, than the more expansive and inclusive longer ones that happen to “phrase” include the term.
I’ve been considering whether keywords are broad or narrow in scope for years. It not only helps in determining how competitive they might be, but also which pages might stand the best chance of competiting for those terms on a site as well.
You’re quite welcome.
I don’t think that Google gave away the secret sauce here, but they did reveal something about how they are attempting to address what might have been considered a flaw in an algorithm. I think the value of using patents for marketing is very limited. As for a red herring, I would suspect that would be most effective in leading a competitor to allocate time and resources towards a technology that likely won’t be developed, but which has some plausibility to it.
Chances are that the process described in this patent may work with a broader range of ranking signals than just impressions or clicks or click to impression ratios, and may involve other ways to mitigate artificially inflated popularity-based ranking signals.
The process described should potentially help enable good sites that get better to overcome the popularity gained by sites that have been showing up in search results ahead of them. I think the “closeness” of IR scores across a range of search results is probably a better way of gauging how much breadth a query might have than the length of the query or the estimated number of search results.
Does it give something away to people who are attempting to rank web pages higher in search results? We don’t have access to Google’s IR scores for pages, so anything we do to attempt to decide how much breadth a particular query might have is handicapped because of that.
Google has published a few whitepapers and patents that describe how they may use information from previous queries to influence what a searcher sees on their next query, but this doesn’t do that. The language in the patent is somewhat confusing, and I can see how you might have come to that conclusion, though.
What this is saying is that if you search for a particular query term, and then you select a particular page in search results on the basis of that query, Google may see that page as popular. If a lot of people choose the same page in response to that same query, then that page might be boosted in search results some because it may be using those clicks as popularity scores.
Where there are potentially a lot of pages that might be a good match for a query (a lot of query breadth), then allowing the first results that have been showing in the top positions for that query to keep on accumlating popularity points may make it difficult for pages that are pretty close to as good for that query to ever outrank those pages that have been at the top for a while. In effect, the rich keep on getting richer.
If someone who doesn’t rank as highly spends the time making their page much more relevant, and builds some quality links to it, because the other pages have been building up popularity ranking scores, the page that has improved may still have trouble outranking pages that have been ranking highly for a while, even if those pages are no longer more relevant than the one that has been improved.
That’s what the method described in the patent is trying to fight – where there are potentially a lot of good results for a particular query, the value of popularity ranking signals might be scaled back so they don’t count as much.
Chances are good that Google isn’t getting its user behavior data from Google analytics, but rather from its query logs, from the Google Toolbar, from cookies, and from other places as well. It doesn’t need the Google Analytics account information to collect user behavior data.
As for The Google +1, chances are good that if Google decides to use the +1 information as part of a ranking signal, it would give more weight to people with more fleshed out Google Accounts and possibly do so based upon some kind of reputation score. If you open 1,000 Google Accounts and vote one page up 1,000 times, chances are those votes would probably not count for much at all, regardless of whether they were on different IP addresses or not.
Or of the high impression rate. Or a high number of bookmarks of a page, or other popularity based measures that Google might be using.
I’ll second that. 🙂
I used the relevancy of a title element as an example of one of many of the different relevancy based signals that a search engine might use to calculate a relevancy or IR score, but I really wasn’t commenting upon how best to optimize a title element or any other element with this post.
The point is, regardless of whether a phrase appears in a title as an actual phrase of as two words that aren’t necessarily adjacent to each other, relevancy signals might count more when there’s a lot of query breadth because any possible popularity signals that might be associated with pages in search results may count less.
I agree. It moderate the manipulation of the CTR. Thanks for share.
I’m encouraged that they understand that the high rankings of some sites might influence those sites to continue to rank well based upon popularity signals when there might not be that much of a difference between the relevance between a fairly large number of sites.
Comments are closed.