Query Logs and the Slang of the Web

Sharing is caring!

One way to help in that process of organizing the Web is to use what people do in the Web.

– Ricardo Baeza-Yates, from a presentation on Extracting Semantic Relations from Query Logs

How related might different search queries be when they share a number of pages in search results, and searchers tend to click upon those shared results more than other results?

If you go to Yahoo’s search, and perform a search for the term [wcca], the first result that you see in the search results is a page titled “Wisconsin Circuit Court Access.” If you search for [wisconsin circuit court], you’ll see the same page at the top of the search results. If many people searching for each of those terms tend to mostly click on the link for that page, and no other pages, it’s possible that Yahoo might start considering those query terms to be very closely related.

Because of that relationship, the search engine might start offering searchers a query suggestion for a related term at the top of the search results for an original query.

A recent Yahoo patent application explores these types of relationships, and tells us that it might learn a great deal from comparing which search results searchers click upon. It describes three relationships for query terms, based upon click data found in its query logs, where it keeps tracks of which results searchers choose for specific queries.

Synonyms (close relationship) – Query terms that share a substantially equivalent set of clicked search results.

Lesser but included (inclusive relationship) – Where the set of clicked results for one query term is smaller in size than another, and those clicked URLs are substantially included in the clicked URLs for the second query.

Related (lesser relationship) – Where the clicked search results between two queries overlap, but not quite to the same level as the two relationships above – synonyms and lesser but included.

In my example above, if people searching for [wcca] and [Wisconsin circuit court] mostly click upon that first search result for “Wisconsin Circuit Court Access,” the search engine might consider those query terms to be synonyms.

The choices of which pages searchers click upon is viewed as implicit user feedback – searchers aren’t explicitly stating that these queries are related in some way, but when they choose shared pages in search results for those queries, it’s assumed that the terms are related.

What would a search engine do with this information?

It might offer query suggestions at the top of search results for a related query, or reformulate or expand search results to include results that are also relevant for the other query term. The search engine might also use these relationships to match queries with advertisements, and in other possible ways. We’re told about this process, that:

Embodiments can detect the slang of the Web (e.g., a taxonomy used by users to perform searches on the Web).

The patent application is:

Extracting Semantic Relations from Query Logs
Invented by Ricardo Baeza-Yates and Alessandro Tiberi
Assigned to Yahoo
US Patent Application 20090164895
Published June 25, 2009
Filed: December 19, 2007

There is a white paper on this topic from the listed inventors on the patent filing available to subscribers of the ACM portal at Extracting semantic relations from query logs. If you’re not a subscriber, there is a video presentation on it from Ricardo Baeza-Yates, which I also linked to at the start of this post.

There are three yahoo research papers co-authored by Ricardo Baeza-Yates which cite that paper, and are worth looking at if you’re interested in how query logs might be used by search engines:

  • Search, Web 2.0, and the Semantic Web The importance of search (pdf)
  • Clique analysis of query log graphs (pdf)
  • The anatomy of a large query graph (pdf)

Sharing is caring!

7 thoughts on “Query Logs and the Slang of the Web”

  1. Hi People Finder,

    I agree with you that it is an approach that could make Yahoo’s search smarter. I do like the idea that they are exploring ways to use data about how people search to make those searches better. I’ve been seeing them do that in some other areas as well.

  2. Wow, interesting synopsis on this patent. I think this is a smart move on Yahoo’s part. Instead of simply correcting spelling, improve the search relevance or offer alternate suggestions that are commonly effective for a specific term. Thanks for the update Bill.

  3. You can see why Bing (Microsoft) wanted to group with Yahoo. Google offer this type of association with their ‘suggestions links’ at the top or bottom of a search results page dependant on how it feels. Again Bill a quality article. when do you get time to sleep?

  4. Hi Joel,

    You’re welcome. It is interesting to see Yahoo actually looking at user interactions with search results to draw conclusions about how related those results might be, so that they can do things like come up with query suggestions. I like the idea in theory, and am wondering how effective it is in practice.

  5. Hi Lee,

    Thanks. There may be some rhyme or reason why Google presents the questy suggestions that it does, where it does.

    I think the idea behind their placement of suggestions is that suggestions that are more likely to lead a searcher off in a different direction are placed at the top of results, and the suggestions that provide good “additional” or “alternative” information to the results that are shown are placed at the bottom of results.

    Sleep? 🙂

Comments are closed.