Quality Scores for Queries: Structured Data, Synthetic Queries and Augmentation Queries

Sharing is caring!

Quality Scores and Augmentation Queries

This past March, Google was granted a patent that involves giving quality scores to queries (the quote below is from that patent). The patent refers to high-scoring queries as augmentation queries. Interesting to see that searcher selection is one way that might be used to determine the quality scores for queries.

So, when someone searches. Google may compare the SERPs they receive from the original query to augmented query results based on previous searches using the same query terms or synthetic queries.

This evaluation against augmentation queries is based upon which search results have received more clicks in the past. Google may decide to add results from an augmentation query to the query results for improving quality scores and the overall search results.

In general, the subject matter of this specification relates to identifying or generating augmentation queries, storing the augmentation queries, and identifying stored augmentation queries for use in augmenting user searches. An augmentation query can be a query that performs well in locating desirable documents identified in the search results. In addition, user interactions can determine the performance of an augmentation query. For example, if many users that enter the same query often select one or more of the search results relevant to the query, that query may be designated an augmentation query.

In addition to actual queries submitted by users, augmentation queries can also include synthetic queries that are machine-generated. For example, an augmentation query can be identified by mining a corpus of documents and identifying search terms for which popular documents are relevant. These popular documents can, for example, include documents that are often selected when presented as search results. Yet another way of identifying an augmentation query is mining structured data, e.g., business telephone listings, and identifying queries that include terms of the structured data, e.g., business names.

These augmentation queries can be stored in an augmentation query data store. When a user submits a search query to a search engine, the terms of the submitted query can be evaluated and matched to terms of the stored augmentation queries to select one or more similar augmentation queries. The selected augmentation queries, in turn, can be used by the search engine to augment the search operation, thereby obtaining better search results. For example, search results obtained by a similar augmentation query can be presented to the user along with the search results obtained by the user query.

How does Google find augmentation queries? One place to look for those is in query logs and click logs. As the patent tells us:

To obtain augmentation queries, the augmentation query subsystem can examine performance data indicative of user interactions to identify queries that perform well in locating desirable search results. For example, augmentation queries can be identified by mining query logs and click logs. Using the query logs, for example, the augmentation query subsystem can identify common user queries. The click logs can be used to identify which user queries perform best, as indicated by the number of clicks associated with each query. The augmentation query subsystem stores the augmentation queries mined from the query logs and/or the click logs in the augmentation query store.

This doesn’t mean that Google uses clicks to determine rankings directly, But it is deciding which augmentation queries might be worth using to provide SERPs that people may be satisfied with.

How Does Google Determine Quality Scores for Augmentation Queries?

There are other things that Google may look at to decide which augmentation queries to use in a set of search results. The patent points out some other factors that may be helpful. Quality scores for augmentation queries may be made from several other scores.

In some implementations, a synonym score, an edit distance score, and/or a transformation cost score can be applied to each candidate augmentation query. Similarity scores can also be determined based on the similarity of search results of the candidate augmentation queries to the search query. In other implementations, the synonym scores, edit distance scores, and other types of similarity scores can be applied on a term-by-term basis for terms in search queries that are being compared. These scores can then be used to compute an overall similarity score between two queries. For example, the scores can be averaged; the scores can be added; or the scores can be weighted according to the word structure (nouns weighted more than adjectives, for example) and averaged. The candidate augmentation queries can then be ranked based upon relative similarity scores.

I’ve seen white papers from Google before mentioning synthetic queries, which are performed by the search engine instead of human searchers. So it makes sense for Google to be exploring query spaces in a manner like this, see what results are like, and use information such as structured data as a source of those synthetic queries. I’ve written about synthetic queries before at least a couple of times, and in the post Does Google Search Google? How Google May Create and Use Synthetic Queries.

Implicit Signals of Query Scores and Quality

It is an interesting patent in that it talks about things such as long clicks and short clicks, and ranking web pages based on such things. In addition, the patent refers to such things as “implicit Signals of query quality.” More about that in the patent here:

In some implementations, implicit signals of query quality are used to determine if a query can be used as an augmentation query. An implicit signal is a signal based on user actions in response to the query. For example, implicit signals can include click-through rates (CTR) related to user queries, long click metrics, and/or click-through reversions, as recorded within the click logs. A click-through for a query can occur, for example, when a user of a user device selects or “clicks” on a search result returned by a search engine. The CTR is obtained by dividing the number of users that clicked on a search result by the number of times the query was submitted. For example, if a query is input 100 times and 80 persons click on a search result, then the CTR for that query is 80%.

A long click occurs when a user, after clicking on a search result, dwells on the landing page (i.e., the document to which the search result links) of the search result or clicks on additional links present on the landing page. A long click can be interpreted as a signal that the query identified information that the user deemed interesting, as the user either spent a certain amount of time on the landing page or found additional items of interest on the landing page.

A click-through reversion (also known as a “short click”) occurs when a user, after clicking on a search result and being provided the referenced document, quickly returns to the search results page from the referenced document. A click-through reversion can be interpreted as a signal that the query did not identify information that the user deemed to be interesting, as the user quickly returned to the search results page.

These example implicit signals can be aggregated for each query, such as collecting statistics for multiple instances of using the query in search operations and can further compute an overall performance score. For example, a query with a high CTR, many long clicks, and few click-through reversions would likely have a high-performance score; conversely, a query with a low CTR, few long clicks, and many click-through reversions would likely have a low-performance score.

quality scores for queries

The reasons for the process behind the patent are explained in the description section of the patent, where we are told:

Often, users provide queries that cause a search engine to return results that are not of interest to the users or do not fully satisfy the users’ need for information. Search engines may provide such results for several reasons, such as the query including terms having term weights that do not reflect the users’ interest (e.g., in the case when a word in a query that is deemed most important by the users is attributed less weight by the search engine than other words in the query); the queries being a poor expression of the information needed; or the queries including misspelled words or unconventional terminology.

A quality scores signal for a query term can be defined in this way:

the quality signal being indicative of the performance of the first query in identifying information of interest to users for one or more instances of a first search operation in a search engine; determining whether the quality signal indicates that the first query exceeds a performance threshold, and storing the first query in an augmentation query data store if the quality signal indicates that the first query exceeds the performance threshold.

The patent can be found at:

Query augmentation
Inventors: Anand Shukla, Mark Pearson, Krishna Bharat and Stefan Buettcher
Assignee: Google LLC
US Patent: 9,916,366
Granted: March 13, 2018
Filed: July 28, 2015

Abstract

Methods, systems, and apparatus, including computer program products, generate or use augmentation queries. In one aspect, a first query stored in a query log is identified, and a quality signal related to the performance of the first query is compared to a performance threshold. The first query is stored in an augmentation query data store if the quality signal indicates that the first query exceeds a performance threshold.

References Cited about Augmentation Queries

These were several references cited by the patent applicants, which looked interesting, so I looked them up to see if I could find them to read and share here.

  1. Boyan, J. et al., A Machine Learning Architecture for Optimizing Web Search Engines,” School of Computer Science, Carnegie Mellon University, May 10, 1996, pp. 1-8. cited by applicant.
  2. Brin, S. et al., “The Anatomy of a Large-Scale Hypertextual Web Search Engine“, Computer Science Department, 1998. cited by applicant.
  3. Sahami, M. et al., T. D. 2006. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland, May 23-26, 2006). WWW ’06. ACM Press, New York, NY, pp. 377-386. cited by applicant.
  4. Ricardo A. Baeza-Yates et al., The Intention Behind Web Queries. SPIRE, 2006, pp. 98-109, 2006. cited by applicant.
  5. Smith et al. Leveraging the structure of the Semantic Web to enhance information retrieval for proteomics” vol. 23, Oct. 7, 2007, 7 pages. cited by applicant.
  6. Robertson, S.E. Documentation Note on Term Selection for Query Expansion J. of Documentation, 46(4): Dec. 1990, pp. 359-364. cited by applicant.
  7. Talel Abdessalem, Bogdan Cautis, and Nora Derouiche. 2010. ObjectRunner: lightweight, targeted extraction and querying of structured web data. Proc. VLDB Endow. 3, 1-2 (Sep. 2010). cited by applicant .
  8. Jane Yung-jen Hsu and Wen-tau Yih. 1997. Template-based information mining from HTML documents. In Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative application of artificial intelligence (AAAI’97/IAAI’97). AAAI Press, pp. 256-262. cited by applicant .
  9. Ganesh, Agarwal, Govind Kabra, and Kevin Chen-Chuan Chang. 2010. Towards rich query interpretation: walking back and forth for mining query templates. In Proceedings of the 19th international conference on World wide web (WWW ’10). ACM, New York, NY USA, 1-10. DOI=10. 1145/1772690. 1772692 http://doi.acm.org/10.1145/1772690.1772692. cited by applicant.

This is a Second Look at Augmentation Queries

This is a continuation patent, which means that it was granted before, with the same description, and it now has new claims. When that happens, it can be worth looking at the old claims and the new claims to see how they have changed. I like that the new version seems to focus more strongly on structured data. It tells us that it might use structured data in sites that appear for queries as synthetic queries, and if those meet the performance threshold, they may be added to the search results that appear for the original queries. The claims seem to focus a little more on structured data as synthetic queries, but it doesn’t change the claims much. They haven’t changed enough to publish them side by side and compare them.

What Google Has Said about Structured Data and Rankings

Google spokespeople told us that Structured Data doesn’t impact rankings directly, but what they have been saying does seem to have changed somewhat recently. For example, in the Search Engine Roundtable post, Google: Structured Data Doesn’t Give You A Ranking Boost But Can Help Rankings we are told that just having structured data on a site doesn’t automatically boost the rankings of a page, but if the structured data for a page are used as a synthetic query, and they meet the performance threshold as augmentation queries (achieving certain quality scores) they might be shown in rankings, thus helping in rankings (as this patent tells us.)

Note that this isn’t new, and the continuation patent’s claims don’t appear to have changed that much so that structured data is still being used as synthetic queries and is checked to see if they work as augmented queries. Nevertheless, this does seem to be an excellent reason to make sure you are using the appropriate structured data for your pages.

I’ve written a few posts about patents involving quality scores for organic SEO:

Last Updated June 27, 2019

Sharing is caring!

64 thoughts on “Quality Scores for Queries: Structured Data, Synthetic Queries and Augmentation Queries”

  1. Really interesting Bill. I like the idea of Google ‘ranking’ queries with an algorithm, similar to how an algorithm ranks URLs. Maybe at some point in the future, they’ll use browser data and signed-in profiles as part of algos to rank searchers!

  2. Hi Dylan,

    This is an algorithm. It does involve multiple steps, such as comparing the results of the original query (number of clicks and other user behavior signals) versus the results for a similar augmented query (a synonym or an otherwise similar query), to decide if if should include the results of the augmented query with the reults of the original query too. But the process involved does follow the definition of what an algorithm is: “a process or set of rules to be followed in calculations or other problem-solving operations.” The augmented query is chosen based upon similarity to the original query, and may be selected from a pool of augmented queried that could be based upon things such as structured data associated with pages that rank for the original query.

  3. Hi Jignasha,

    If you haven’t started setting up pages to have them ranked, it’s time to start publishing pages, and learning about how to rank them. Google will continue to come out with new ways for pages to rank in search results, but if you don’t have any pages published, you dont’ stand any chance of ranking at all.

  4. Thak you for sharing this information this information is very helpful for me and i bookmarked your site because i found your content important for me

  5. The great idea for Google Ranking Thanks for sharing this information. This article makes me understand the proper meaning of SEO and how it is important for post to rank on google.

  6. Hi Bill, I found your blog via Google while searching for such kinda informative post and your post looks very interesting for me.
    Thanks for sharing a great article.

  7. Hi Bill! I would like to thank you for the efforts you have put in writing this site. I’m hoping the same high-grade blog post from you in the upcoming also. Actually, your creative writing skills have encouraged me to get my own website now. Really the blogging is spreading its wings fast. Your write up is a great example of it.

  8. Hi Alan,

    Thank you. I hope that your blog gets off to a good start, and that you have fun with writing it and learning from it. I do confess that I have learned a great deal from blogging regularly over the past 12 years here. I continue to learn from it almost every day, too.

  9. Thank you for the information. I learned a lot from it. I appreciate you the detail you went into. I am grateful for the amount of time and effort you put into this helping us. Your insights and summary are beneficial.

  10. hi Bill, to be honest, i dont really know what i want to say except thanks a lot for sharing this article, its really nice. Greeting from indonesia!

  11. You are draw the structure its very helpful
    Many things clear after read this post

    Thanks for sharing

  12. Really enjoyed reading your blog.It is highly informative and builds great interest for the readers. For the people like us your blogs helps to get ideal information and knowledge. Thanks for providing such blogs.

  13. Interesting I also wonder if google also use tabular data to generate synthetic queries which are then used to generate FAQ/Question’s people ask that appear in serps.

  14. wow Awesome post
    I was searching for something like this. I liked the way you mention your understanding
    Thank you so much for sharing this post with us.

  15. This is very informative article today i Know how google work and put ranking on URL please keep it up sharing more information about google algorithm

  16. Hi @neuromancer,

    It’s possible that Google may be using tabular data to generate synthetic queries that may then be used in other ways, such as to build a question graph that may inform related questions that appear in SERPs.

  17. Hi Anirudh,

    YES, the process that Google describes in the patent they were granted does tell us some interesting things they might be doing with structured data that we may not have understood them to be doing before.

  18. Great Explanation about the algorithm with flow chart made it the best way to understand. Thank you for such a good article. I will be waiting for more articles like this one.

  19. I would like to say thanks to you for sharing such information as it helps me in gaining that extra knowledge that rarely people tell. A great read and I would surely recommend it to my friends.

  20. Hello Sir

    Awesome Post! I like your idea Google ‘ranking’ queries with an algorithm.

    Thanks for sharing great information here.

  21. This article is extremely knowledgeable, do give it a read. Thank you so much (Bill Slawski) for such a great piece of information. Kindly write more such articles as your content is very simple and helps people like me understand things better. Thank you so much.

  22. Hi MD,

    I’m glad that you liked this post. I tried to make it as easy to understand as possible so I’m really happy that it came through to you as well as it did. Thanks.

  23. Hello, Team seobythesea
    Very good to listen to you.This is very amazing tips and tricks given by you Which is really matter in ranking and we just don’t care about it. Your way of writing is very good.
    Well, thanks a lot for sharing with us such a nice content.

  24. Thak you for sharing this information this information is very helpful for me and i bookmarked your site because i found your content important for me

  25. Wow,
    I’m stunned by your knowledge for this subject, it was a fantastic article.

  26. Great Explanation about the algorithm with flow chart made it the best way to understand. Thank you for such a good article. I will be waiting for more articles like this one.

  27. Hi Bill

    Thanks for sharing very informative post, I am bookmark your website for further article awesome tip here i found.

    Thanks for sharing great post!

  28. hlw
    i am shama malik your blog is very nice and too good for us.This articles give valuable knowledge and suggestion.
    thanks for this post

  29. Very useful information about queries and i am glad know about it and very good structure data.

  30. Nice post and well written. I was checking consistently your blog and I am impressed! Incredibly useful information specifically the last part. I maintain such info much. I used to be looking for this certain information for a very long time. Thank you…

  31. Your blog is very useful for us. This blog gives valuable knowledge and suggestion about structured data and ranking.

  32. You just rock it! when it comes to writing a perfect blog. I am pretty much impressed with your writing skills. Awesome Blog and very informative articles. Thanks for providing such a class written article.

  33. Great post. I used to be checking continuously this blog and I am inspired! Extremely helpful information particularly the ultimate part 🙂 I take care of such info much. I was seeking this certain information for a long time. Thank you and best of luck

  34. Thank you so much for sharing this information, this will surely help me in my work and therefore, I would like to tell you that very fewer people can write in a manner where the reader understands just by reading the article once carefully.

  35. Bill,

    It seems like the augmented queries are in full force with misspellings and to some extent with generic single word queries where the query intent is unclear. Also, I’ve seen results returning “interesting finds”. These results are usually pulled in from FAQ pages and blogs. My question is, could the augmented queries be used to generate knowledge graphs, site links, and featured snippets?

    Ashish

  36. great post, Thanks a lot for sharing this info, You have given valuable knowledge for structure data and rankings..
    keep it Up..!!!

  37. Hi,
    I am really happy to say it’s an interesting post to read. I learn new information from your article. This is great advice! Very honest and practical. I really enjoyed this post. Nice post!! these tips may help Great post, Jo! My favorite work advice.Thanks so much for a detailed post! It is very helpful for, you are doing a great job. Keep it up.

  38. No one share such insights on blog.
    I found this article very helpful for me
    Will read again, now I bookmarked your blog.
    Thanks! for sharing

  39. Hi Montana,

    This post has nothing to do with your question at all. But, I can answer based upon my experience ranking sites – links do seem to still make a difference – sites without many links appear to have difficulties ranking highly at places such as Google. ON page SEO can help a page rank well – a combination of on page SEO and links really is the sweet spot to aim for to rank.

  40. Thank you for the information. I learned a lot from it. I appreciate you the detail you went into. I am grateful for the amount of time and effort you put into this helping us. Your insights and summary are beneficial.

  41. Usually, I never comment on blogs but your article convinced me to comment on it as is written so well. And telling someone how awesome they are is essential so that on my part I convince you to write more often.

  42. Hi Bill, I wish there was more technical SEO content like this in my language. Congratulations for the quality of your blog and thanks for continuing to share such informative entries. Also, the references to the patents that you include are very interesting and useful.

  43. Great article. Can you please recommend any Structured Data or technical SEO course you think will be of great help to me? Regards.

Comments are closed.