Rewriting a Query Using Entity Detection
Google has several special search operators that you can use in a search to specialize your searches.
One of those special search operators is the “site” operator, which allows you to restrict your searches to a specific domain or website if you use a special “site” command (or operator).
Example “site search” queries:
site:www.seobythesea.com google patents
site:www.espn.go.com Derek Jeter
A newly granted patent from Google may assume that a searcher would like to see results from a search of a specific site and search results from other pages on the Web. The patent attempts to make up for typical searchers who may fail to use the “site” operator in their searches. As the patent tells us:
Some search engines permit a user to restrict a search to a set of related documents, such as documents associated with the same website, by including special characters or terms in the search query. Oftentimes, however, users forget to include these special characters/terms or do not know about them.
The process behind this patent looks for what the inventors call “entities” as part of the search query. An entity can be “anything that can be tagged as being associated with certain documents.” For example, entitles can include:
- News sources,
- Online stores,
- Product categories,
- Brands or manufacturers,
- Specific product models,
- Condition (such as new, used, refurbished, etc.),
- Places, and;
Some entity names are unambiguous and unique, while many others are somewhat ambiguous or generic. If an entity name can be identified, a searcher’s query might be rewritten based upon that entity name. That rewritten query may become part of the search results shown to a searcher, or a link to “site” search results may be provided.
The entity names may be found on the Web in directories, lists, and other places and may be associated with a particular set of pages.
Entity Detection Example
The term “MSNBC” may be identified as an entity associated with the set of pages at the domain http://www.msnbc.msn.com/. If someone were to search for [george bush msnbc], Google might rewrite that search to be [“George Bush site:www.msnbc.msn.com/], and include those results within the set of search results for [george bush msnbc], possibly near or at the top of those result. Or, it may include a link to results for that “site” search at the top of the results. It’s also possible that since the entity “MSNBC” is news content, that news results blended into the Web search results may focus upon site search results from http://www.msnbc.msn.com/.
The patent is:
Query rewriting with entity detection
Invented by Hong Zhou, Krishna Bharat, Michael Schmitt, Michael Curtiss, and Marissa Mayer
Assigned to Google
US Patent 7,536,382
Granted May 19, 2009
Filed: March 31, 2004
A system receives a search query, determines whether the received search query includes an entity name, and determines whether the entity name is associated with a common word or phrase. When the entity name is associated with a common word or phrase, the system generates a link to a rewritten query, performs a search based on the received search query to obtain the first search results, and provides the first search results and the link the rewritten query.
When the entity name is not associated with a common word or phrase, the system rewrites the received search query to include a restricted identifier associated with the entity name, generates a link to the received search query, performs a search based on the rewritten search query to obtain second search results, and provides the second search results and the link to the received search query.
Entity Detection Conclusion
The patent provides more details on how the inclusion of an entity name may influence the search results you see and how Google might identify entity names and associate them with specific pages.
One potential impact of this query rewriting process based upon the detection of entities in a query might be that if queries include brand names or business names or product names, or any of the other kinds of “entity names,” that pages associated with those entities may appear at the top of the search results.
19 thoughts on “Boosting Brands, Businesses: How a Search Engine May Rewrite a Query Using Entity Detection”
Google search operators are very helpful for thinning out unwanted results in Google searches. If Google can recognize web entities and replace that with the site: operator, it will help the millions of users who never add the site: operator to their search query.
One problem I have noticed, using the latest version of Firefox on both Windows Vista and Ubuntu: the link: operator is not returning Google results. It simply returns the main search page with no results. However, the link: operator works fine with Internet Explorer.
Has anyone else encountered this problem? Not sure if it is a Firefox bug or is maybe related to my Service Providers ISP. Thanks.
Hi People Finder,
There’s a positive and a negative to this process, I think. While it may help people who might have used the site: operator if they knew about it, it looks like it may have the effect of moving the site that is “associated” with the entity name to the top search result, even if that result may not be the most “relevant” for that search. Organizations, sites, businesses, and brands that are well known on the Web may benefit to the detriment of ones that aren’t as well known.
I’m not experiencing the same problem that you are with the link operator for Google Results. I wonder if a Firefox extension is causing the problem that you’re experiencing, or if anyone one else is having the same problem. I don’t know if anyone would answer your question at the Google Webmaster Help Group – http://www.google.com/support/forum/p/Webmasters?hl=en – but it might be worth posing there. It’s the one place I can think of where you might actually get an answer from someone at Google.
That is true. This could end up being good for entities that are already well known. Existing ‘kings of the mountain’ may benefit the most from this.
Hi People Finder,
I wonder how much that might have been considered in the creation of this process. It may take some effort for site owners to have their “entities” recognized by Google and associated with specific pages. There are businesses with brands, business identities, and product lines that have a head start, but I image that getting that kind of recognition should be achievable for organizations and individuals that make an effort.
Here’s my point of view trying to look from the average user perspective:
Google has done many improvements with its philosophy of taking 100% responsibility and trying to adapt THEM to the user and not trying to adapt the USER to them. So instead of educating users how to search using their rules they analyze how users search and adapt by people rules.
This is another try to do this. I think that the entry of adding msnbc to the search query won’t replace msnbc but just give a priority to that site. If I’m looking for an article on diet and write: Diet msnbc maybe I’m not just looking for an article written by their authors but also for reviews of that article on other sites. So that’s the negative side.
Anyway, this blog is awesome because it gives a food for thought (or whatever they call it :). It’s always nice to get into the “minds” of search engines 🙂
Hi Finder Mind,
Thanks for your perspective. You raise some excellent points. Google does seem to be trying to adapt to searchers rather than “educating” searchers on how to use their search engine.
Trying to keep search as simple as possible can be a good idea. For instance, one example is blending results into a web search from their other databases, such as news, images, videos, blogs, books, scholar, etc., rather than requring searchers to click on the different tabs for each of those results. If Google thinks that a “site” search result might be appropriate based upon the inclusion of an “entity” like “msnbc” in a query, we might see results from “msnbc.com” placed at the top of the search results, with a link to “see more results” from that domain. As you note, people might be looking for reviews of an msnbc article, and those might also appear in the search results after the msnbc listing.
Query refinement suggestions, including spell correction suggestions, are another example of where the search engine tries to make it easier for searchers to find what they are looking for.
I would love to see some data from Google on how many people actually use some of the different advanced search operators, like the “site” search. That might give us a better idea why they might come up with a process like the one described in this patent.
While reading your post I had a similar idea like yours:
If someone writes something like: weight loss msnbc the first results might be from msnbc and the searcher can be offered more options to search only the domain msnbc.
The problem I see here is wording. How would you communicate with short words to the searcher the point that clicking that link would search the domain…
Maybe: Search only MSNBC website for “keyword” or: Display more results on “keyword” from MSNBC website, or something similar would be the solution who knows.
Hi Finder Mind,
That’s a question that I’ve been wondering about the answer to as well.
It’s possible that a search from a domain associated with a named entity may be boosted to the top result, with an additional indented result shown, and then a link under that to see “More results from http://www.example.com.” I’m not sure that it is necessary for Google to tip their hat that they’ve integrated a “site” search into the search results page, and it may possibly confuse searchers to come up with some different wording.
I had always suspected Google was rewriting my query to it’s own engine. Rand referenced this blog entry concerning Googles decision making process concerning intent.
Thanks for the heads up. There’s always been the possibility that Google (and other search engines) would rewrite some of your queries, including doing things like spell correction, considering plurals and singulars and other possible variants of words, identifying synonyms, and more.
If the Google search engine identifies what it believes is an entity in your query, and it has associated a particular page with that entity more than other pages, it might show extra pages from that site, based upon a perception that you intended to search the pages of that site.
It makes things interesting, doesn’t it?
Great post. I agree with Findermind.
Comments are closed.