Will Keywords be Replaced by Topics for Some Searches?

The example for the post I was writing for today appears to have been hijacked by the Simpsons. They made an apology to Judas Priest, after referring to the band as a death metal band. The image below is from a Guardian news article on the apology which is presently highly ranked on a search for the word “Judas”. See the search results below:

Bart Simpson writing on a bulletin board that Judas Priest is not a death metal band.

I wanted to show a set of search results from Google that may have been based upon Google matching the topic of a post rather than keywords, which might help improve the relevance of search results for videos and media rich results, according to a Google patent granted on the last day of 2013, which uses that example.

Search results on a search at Google for Judas.

Topic-Based Search Results

Here’s the example from the patent, which gives us a view of Google behaving in a way that most of us aren’t accompanied to, even if some of us have written that Google may start focusing more on concepts rather than specific keywords, and even though we’ve seen Google returning results under the Hummingbird update that don’t match all keywords within a query.

By way of example, consider a search query that includes the word “Judas.” That word, “Judas” can be mapped to certain domain topics such as “Born This Way” and “Lady Gaga.” “Born This Way” is the name of a popular album that includes a song called “Judas,” and “Lady Gaga” is the artist who created that album and performed the song “Judas.”

A conventional keyword-based search engine would only return results with the word “Judas;” however, the disclosed topic-based results can include results that are relevant, even if those results do not include the word “Judas.”

For example, such relevant results can include the words “Lady Gaga” or “Born This Way” and so forth.

The topic-based search results can therefore include many other results from the same album or by the same artist, even when the user is not aware of the titles of those related songs.

Did the video for “Judas” appear in those search results because Google performed a topic based search, or would Google have returned it highly anyway, based upon PageRank and Relevance?

We can’t be completely certain, but the patent is worth looking at closer and thinking about.

Multiple Sources to Identify Topics

It can be pretty difficult to read a patent about a possible ranking update and determine whether or not the method within the patent claims and/or description has been used.

There may be some technical limitations presently that might keep Google from completely incorporating topics into such an algorithm, as described in a paper posted this morning on the Freebase Google Plus page.

The paper is Trust, but Verify: Predicting Contribution Quality for Knowledge Base Construction and Curation (pdf) (Highly recommended reading!) Before I provide a link to the patent, this passage from the paper had me wondering how ready Google might be to start using topics to rank web pages:

While these results are not reported in the paper, during development we examined which of the concept space and expertise representation are most useful. Our analysis suggests that the Taxonomy and the Predicates concept spaces are more useful than the large Topics concept space.

This is because the Topics concept space has an order of millions of topics, thus spreading the expertise distribution too sparse for users contributing not a lot of triples.

The paper does a great job of explaining how Google might incorporate user contributions into Freebase, and it appears that topic-based contributions might not be useful yet as other contributions. While Freebase does supply information used in Google’s knowledge base, it’s possible that Google might look to other sources to better understand things such as topics, such as Open Information Extraction.

The Google patent is:

Search query results based upon topic
Invented by Jianming He and Kevin D. Chang
Assigned to Google Inc.
US Patent 8,620,951
Granted December 31, 2013
Filed: June 1, 2012

Abstract

Systems and methods for returning results to a query based upon topic are disclosed herein. Aspects disclosed can be particularly useful when searching for videos or other media content for which associated textual information are generally relatively sparse compared to other types of content.

Text associated with the query can be semantically associated with various domain topics by mapping one or more words included in the query to one or more domain topics based upon a conditional probability of the domain topic given the query. A set of results can be identified based upon a conditional probability of the result given the domain topic.

Of course, the question needs to be asked if topic based information from a knowledge base is even needed at this point.

Can Google get that information elsewhere?

The Open Information extraction approach is one way for Google to find out that kind of information. Google seems to use automated ways to get information, and crowd sourced ways such as people contributing to places like Freebase. It’s quite likely that both types of sources help build upon each other.

Topics For Queries and for Results Based upon Probabilities

The patent tells us that the focus on identifying topics depends upon the calculations of probabilities related to topics, and can be broken down into a couple of steps or tasks:

First, domain topics can be identified based upon the query. Second, representative results for those domain topics can be located. Such tasks can be accomplished by analyzing suitable statistics associated with past queries and computing various conditional probabilities.

The patent then goes on to provide more details, and mentions how some additional information can be used.

The conditional probability of a domain topic given a query, P(T|Q), can be employed to map domain topics to the query. The conditional probability of a result given a domain topic, P(R|T), can be employed to identify results for a topic-based search. These two probabilities, P(T|Q) and P(R|T), can be determined by various means detailed herein. In some embodiments, certain probabilities used to determine one or both P(T|Q) and P(R|T) can be determined by external components, and those externally-produced probabilities can be leveraged, if available.

I’ve written recently about how Google might be working to identify related entities in the post, Entity Associations with Websites and Related Entities.

This patent tells us that Google may work to better understand topics possibly in similar ways, so that a query for “astronomy” might be seen as within a topic that could include “Hubble images”, including a video that might show off those images, even if the word “astronomy” doesn’t appear on the page that shows the Hubble images. (Another example from the patent.)

Popularity Based on Things such as Views and (YouTube) Likes

I haven’t seen a Google patent refer to “likes” before as something that could influence rankings, but this one does. What’s not clear here is that the likes being referred to are mostly likely YouTube Likes instead of Facebook Likes (though the patent doesn’t distinguish between one or the other.)

The patent tells us that the Hubble video, without any actual reference to Astronomy, might be returned as a search result because:

(1) It is established that “astronomy” and “Hubble images” are related concepts, and;
(2) The result is popular according to certain indicative metrics (e.g., views, likes, etc.).

“Views” would make sense for a result that’s a video, but the patent’s claims section doesn’t limit this approach to just videos, even though the patent’s description says that they might be a good candidate for this approach because the text associated with things like videos tends to be limited.

Take Aways

The process within this patent doesn’t appear to be in effect yet, but seems like something that Google might just do sometime in the future – not so much a question of if they will do it, but rather when.

I’m going to be keeping an eye out for search results now that don’t include keywords within the actual query, but appear to be related by topic.

What about you?

Share

29 thoughts on “Will Keywords be Replaced by Topics for Some Searches?”

  1. I don’t know, it seems as if they are forcing these changes a bit ahead of time. For me personally, I still like the idea of keyword based search, simply because a machine cannot understand concepts that well.

  2. Hi Ivan,

    It’s not just one machine – it’s a planet wide network of connected machines, and a growing knowledge base, fueled by the contributions of many humans to sources like Freebase, and information based on the activities of humans in query log files, where information about query sessions can be found and associated together and click logs were the pages that real people selected in response to one of their queries can be found. It involves understanding taxonomies of concepts and topics from sources like Wikipedia, which is a project based upon the activities and knowledge of people.

    What we are seeing in patents like the one I wrote about in this post is really just the beginning, where machines are being taught about concepts and topics, but a lot of what they are learning comes from people involved in crowdsourcing that information, whether they are aware of it as contributors to places like wikipedia and Freebase, or not – like in query log and click log file information.

  3. I’m definitely noticing topical results for this already for many searches (not for media or video) for clients. Personal Injury lawyer for example, I can search “car accident lawyer” and it will return Personal Injury as the highlighted keyword.

    Great post Bill!

  4. Bill… Belive this is already happening to a large degree, where Hummingbird is the understanding-of-meaning-not-the-message, not keyword type match.

    The only question is “when?” does a keyword string value become of diminuous import, and I think that’s when links, and link anchor text, and referral page content / tags etc. become so less weighted that ‘traditional’ links no longer matter, and citation or database topic connections that reinforce relevance / authority become the ‘new’ link juice. (Also already happening to some degree)

    Cheers

  5. Thanks for sharing that observation, Chris.

    That replacement doesn’t sound like a synonym type replacement of the type we’ve seen from Google in the past – “Personal Injury” is definitely more like a topic derived replacement than a synonym type.

  6. Hi Grant,

    I definitely have to spend some more time doing searches and looking for some additional examples of these. The patents I’ve pointed to here in the past seem to focus more upon using substitute or synonym terms for terms within a query, while this topic-based search doesn’t seem to – so there are going to be some similarities and some differences as well.

    I agree with your points about reduction of the importance of citations/anchor text/PageRank for some queries, and there being a “the new link juice” or alternative ways of measuring popularity and quality other than just links in these topic based searches.

  7. I too have already seen this happening, which makes me wonder if Google tests its technological developments before applying for the patent. I think so. It’s typical of inventors in all industries to make prototypes of their inventions before applying for patents so that they can work out the problems and incorporate their knowledge with working models into the patent application.

    I think Google has actually been doing this for a while now.

  8. Hi Allen,

    I can’t say with certainty that I’ve personally seen results that look like they were based upon a topic instead of involving re-writing a query with a synonym, but yes it is quite possible that Google has experimented with doing this before the patent was granted. The synonym re-writes can happen a number of ways through a number of different processes, but the example that Chris gave in his comment does very much look like it was based upon a topic approach.

    Given the millions of searches at Google every day, running some experiments like this does make a lot of sense. Google doesn’t necessarily need to test something completely before applying for a patent, but I suspect that they often will while it’s pending.

  9. When you have The Simpsons and a popular Death Metal band in the same sentence, it’s going to rank well on Google. This is why I always inadvertently add “Justin Bieber” to all of my articles. It’s purely by accident, of course. Gangnam Style.

    “Popularity Based on Things such as Views and Likes” – I presumed this was a given, especially since Google Authorship became such a big deal. Page views will surely figure as well. I wonder what Google will make of Biz Stone’s Jelly app, which is adding a new dynamic to searching possibilities. Their answer to problems seems to be to spend about $3 billion to buy the offender (such as with Waze), so it’ll be interesting to see.

  10. Hi Alex,

    (Note to self – more Bieber references)

    I suspect that any author rank that comes out of Authorship is going to depend in part upon your actual social interactions as well as things like views and likes because (1) many social shares and posts often are time sensitive and their value may wear off if they depend upon accruing PageRank to start showing up in search results, (2) Things like views and likes and shares or comments might potentially be prone to abuse, but there are possibly some ways around that, like limiting the ones that you count to people signed into Google accounts that Google might trust.

    The Jelly question is interesting, and I suspect it’s getting a fair amount of scrutiny from Google and Yahoo! and Microsoft as a potential acquisition target. It wouldn’t be the first time that Google liked something enough from Biz Stone to buy his company.

  11. For some time Google has tracked click through and interaction with Search ads to determine a quality score. They do the same thing for reviews by individuals. It would make sense for them to use their own click stream data for a query to return the most selected result as well as the query transition from the user as they hone in on the specific information or result they are looking for. I am not sure that a better semantic understanding by the algo (hummingbird) is the only driver, historical user behavior most likely is an strong signal as well.

  12. I have been looking out for these exemplary queries since Hummingbird, but as you note, it is hard to be 100% certain that the query really does highlight the way Google determines intent since August…
    so far… ‘missed tv programme’ returns catch-up and on demand TV results in the UK (1st result foes not have the word ‘missed’ on the page).
    ‘festive exchange’ gives a lot of Christmas related results at local theatres. Again, not direct keyword matching there either.
    I will keep looking 😉

  13. Google is doing the right thing and focusing more on users. Keywords search had its glory for a very long time and now whats in the future of google search. thanks for this detailed post.

  14. Hi Bill,

    Thanks for your great work analyzing Google’s patents.

    I think it is a little weird to see how Google is only looking at YouTube’ indicators… What about other video platforms or those sites storing videos in their servers? Are they going to trust external sources of ratings, maybe using Panda as a way to measure if they are trustworthy?

    Just thinking ;).

  15. What will they teach the GoogleBrain next, syllogism ?

    It’s less and less of a search algorithm now and evolving into an expert system, it probably won’t be long until it can parse natural human language, typed or spoken.

    Here’s an informative read (you’ve probably covered it) :

    http://www.google.co.in/patents/US20060166174

    Thanks for the great article.

  16. Hi Gary,

    Honestly never saw that patent application before. Many thousands of patents are filed each week, and at least 5,000 or so are published as pending applications each week as well.

    It was published as a pending application back in 2006, but hasn’t been granted yet. I usually limit myself to patents from the search engines and to patents about search engines. Never saw this one when it was published.

  17. What do you think about the apparent trend of Google favoring its own social media? Many blogs are reporting that google + is becoming much more relevant to searches than Facebook. Is this something you see continuing in the future?

  18. Relevance is in and matching keyword is out. It was long due and good it has been implemented for some searches. I am sure in the due course of time we will see these results in more and more queries. Great article. Basically keywords will stay it is only that the results will be broader in context to our query.

  19. I agree with some of the above comments. I’m not quite sure that search engines are ready to drop the concept of keywords quite yet. Sometimes it seems like there’s change merely for the sake of change and I don’t feel that it is a sign of progress or productivity. Still, it’s nice to see new technologies rise.

  20. Great piece Bill. This is going to lead to a piece of great content receiving more traffic than it previously would.

    It worries me a bit that popularity would be based on views and likes since those factors are easily gamed.

  21. I think trying to change a widely used platform of searching to topic based would be VERY tough for people to follow. I think the quality of results would be terrible based around topics rather than keywords. Google has continuously defined the best avenues of weeding out spam results and ranking relevant ones, so why change it up? If it ain’t broke, don’t fix it.

  22. Bill, it is a nice post and I really think about it what the main idea of this post, after hummingbird Google is trying to catch the long tail searches queries the people use to search for something and it affect the keywords but keyword also have the same results as it is before . What we need to do is to optimize our sites by using the keyword but it is better to use long tail keywords that are more helpful to get better rank in SERP.

  23. Thanks, Florian

    It appears that they had changed the URL for some reason. Thanks for letting me know – I’ve updated it, and saved a copy in case they change it again.

    Bill

  24. I noticed in topic based search results sometimes keywords are replaced due to some spelling mistakes or most relevant keywords to your research term. There are some people who say that after Hummingbird update the keyword system comes to an end because now Google consider whole the researched queries rather than a single keyword. After reading your post I am able to understand the logic of replacing some keywords in search engines.

  25. Great post! It definitely seems like we are going in this direction, some general things I’m seeing on the Moz blog and others is that we need to start focusing on and optimizing content for keyword topics rather than just the keywords themselves.

  26. You give a new idea and I really appreciate it but I still force on keyword based search. Google updates specially design to catch and crawl on keyword based search. As you say about the topic this can happen in some cases so you can’t take it lastingly as Google makes changes in algorithms.

  27. Insightful as always Bill – I guess this is the natural progression/extension from what we have already seen with synonyms for sometime?

    Big implications for the ‘old school’ SEO – ‘keyword’ led search campaigns, keyword specific tools, exact match domains etc…. Continued push towards big brands, ‘authorities’.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>