Search Based upon Concepts: Applied Semantics and Google

A recently granted Google patent from the founders of Applied Semantics discusses a search interface that could help searchers find web pages based upon the meanings of their queries rather than just pages that include those keywords.

In the late 90s, Adam Weissman and Gilad Elbaz decided to start a search engine that would search on meanings or concepts instead of keywords. Along with a few friends and family, they formed a company named Oingo, and along the way filed for a patent on a search based upon meanings rather than keywords.

The technology they developed could be used in a number of ways in addition to search, and provided an interesting alternative to keyword based search that would lead to some significant developments in the world of search engines.

Oingo Changes Directions

Roughly around the same time that Oingo was developing their technology, Google launched, focusing upon matching keywords in queries, and started catching on with the public. Eytan Elbaz, another Oingo founder, noted in a 2008 interview:

In 1999, “my older brother and his friend and a cousin came up with this idea for Oingo, a meaning-based search engine. We came up with this idea for making search engines better so you’d search based on meaning rather than text.

We went down that path for a year when it occurred to us that people didn’t want to search like this, they wanted to search on Google.

“We decided to shift gears and focus more on internet ads and contextualize target using the same technology.”

While Oingo didn’t offer search directly to the public, they started out creating search tools that people could use for free on their sites, and had some success in building a search that worked with the Open Directory Project. They diversified, and developed and filed for a patent on a meaning or concept-based advertising system.

They also started building enterprise level tools, and the company changed its name in May of 2001 to Applied Semantics, to “better reflect Oingo’s altered business model.”

They published a number of whitepapers that describe the CIRCA technology (Conceptual Information Retrieval and Communication Architecture) that they developed, which focuses upon understanding conceptual information behind strings of text in a number of information management applications.

Applied Semantics developed a way to use that technology with News Content, which was used by USAtoday.

Applied Semantics also launched Adsense in 2002, using their CIRCA technology to understand and extract the central concepts that appear on pages to deliver advertising that matched the context of the pages those ads appeared upon.

Google and Applied Semantics Merge

In 2003, Google merged with Applied Semantics.

The most obvious result of that merger is Google’s use of the technology in their Adsense offering, but there may be more to Google’s use of the CIRCLA technology behind the scenes. The two Applied Semantics patents linked above have long been reassigned to Google. And a patent filed a few months after the merger, listing Adam Weissman and Gilad Elbaz has just been granted this week.

What makes the Web search techology developed by Oingo different than the keyword matching approach initially used by Google? From one of the original pages on Oingo.com:

The value of Oingo’s meaning-based method goes well beyond the mere filtering of irrelevant results.

The true power of our technology is demonstrated with a query such as “shopping for fishing gear”. Once exhausting a search for Web sites containing all three words, a traditional text-based search engine resorts to looking for two-of-three word matches. This search could yield results about shopping and gear, but having nothing at all to do with fishing, or conversely about fishing gear, but nothing to do with shopping!

An Oingo meaning-based search does not give up so easily; it essentially tries hundreds of possible combinations of related terms before giving up on finding information related to all three concepts. Consider the following examples of highly relevant results for this query: “Buying Fishing Equipment Online”, “Fishing Equipment Retail Shops”, and “Gifts for Fishing Enthusiasts”.

A traditional text-based search cannot possibly see the high relevancy of these results unless all three of the specific search words just happen to appear together on the page.

Because of this, traditional text-based search results can seem arbitrary, even random, at times. Replace the word “employment” with “jobs”, for example, and you will typically get entirely different results. By searching on meanings, instead of just words, our search eliminates this “randomness” of results.

The newly granted patent from Google describes a different way of searching, in which a searcher not only enters search terms, but through a search interface that can present additional options after the initial query, enabling searchers to choose between different concepts that the initial query terms might be included under, and define other aspects of a search that focuses more upon meaning and concepts than the keywords themselves.

Less Keyword Matching, More Attempts to Understand Meaning?

You may have noticed that more and more searches at Google, Yahoo, and Bing offer not only search results but also query refinement suggestions, and in the case of Bing, sometimes choices of results from different categories that may be associated with a query term.

The predictive results in the dropdown under the search boxes at those search engines also attempt to guess at the intent behind a search as well.

Google’s announcement of a couple of months ago about how they might expand queries typed in by searchers to include synonyms is also a move away from strict keyword matching to results that might more closely match the meaning behind a search rather than just the words.

The patent is at:

Methods and systems for detecting and extracting information
Invented by Adam J. Weissman and Gilad Israel Elbaz
Assigned to Google
US Patent 7,689,536
Granted March 30, 2010
Filed: December 18, 2003

Abstract

Systems and methods that detect information and extract information are described.

In one aspect, target rules are defined for detection of target hits in an article, including defining a target article region, extraction rules are defined based on the target rules for the extraction of extracts from the article, including an extraction article region, target rules are applied to each target article region of the article to determine target hits, and extraction rules are applied to detect at least one extract from the article based on the determined target hit.

The patent may hint at additional search interfaces that Google might show in the future that are intended to help searchers find better matches for the meaning behind their query terms rather than just good matches for the keywords used in their search.

Share

23 thoughts on “Search Based upon Concepts: Applied Semantics and Google”

  1. It’s interesting to see how Google is progessing with their intention to expand queries typed in by searchers to include synonyms.This seems to be a natural evolution, which in a few years, will make the keyword based queries seem very old-fashioned.

  2. I use Google.co.in and found that Google is already using synonyms while searching for a keyword. That it more interesting when we misspelled some words and it suggests us the correct word…Google is becoming more and more intelligent.

  3. “The newly granted patent from Google describes a different way of searching, in which a searcher not only enters search terms, but through a search interface that can present additional options after the initial query, enabling searchers to choose between different concepts that the initial query terms might be included under, and define other aspects of a search that focuses more upon meaning and concepts than the keywords themselves.”

    I think this can only be a good thing, more accurate results means happy searchers; however could this potentially cause problems for SEO’s. Imagine a user searching with a specific query; they’re then presented with refinements to help their search which is then used and leading the user to a site.

    Could it be in the future that it’s no longer any good to rank highly for keywords; you need to rank highly for the refined keywords. The big question is, what are the refinements going to be and can we track what the most popular one are?

    Looking through at what Oingo is saying, it sounds like highly authoritative sites which rank in a wide range of keywords would consistently be ranked highly in their system. I think this would sort the SEO pro’s, who are targeting the long tail, from the single keyword focused amateurs.

    I look forward to seeing what Google’s next move is.

  4. Hi Paul,

    It does seem like a natural evolution, doesn’t it.

    Looking back at what Google offered in terms of keyword matching, versus the kind of search that Oingo offered – searching by concept, I’m wondering if Google was more successful back then because it was easier for the public to grasp why the results that they were seeing showed up in search results. The pages that appeared were pages that contained those words.

    Being told that pages that are the “best matches” for a search containing some query terms, where those query terms don’t appear on the pages might be harder for searchers to accept. How did a search engine make that decision that those are the best pages? What if the “concepts” they believed were related to the query weren’t what searchers actually intended?

    Presenting query expansion as a way to capture synonyms as well is still something that searchers can grasp and understand, much like being told that pages where chosen because they contained the keywords used in a search. I actually like the idea of including suggested query refinements that searchers can choose to see related concepts more than just including within search results pages that might be relevant because they might cover related concepts – I’m not sure that keyword matching itself will go away.

    I believe that even Oingo provided the chance to see results that were straight-up keyword matches. I don’t know if we will ever lose those keyword matches, but hopefully we will have more options in the future to refine our search results based upon other concepts that might match the intent behind our queries.

  5. Hi smsinhindi,

    Google did announce on the Official Google Blog that they would be inserting results for synonyms in search results a couple of months ago, and those suggested spelling corrections have been around for a number of years now. Google isn’t alone in providing more and more useful options, though. We’re seeing similar features at Bing and Yahoo. And it’s good to see.

  6. Hi Caliban,

    Interesting thoughts. Thank you for sharing them.

    I think the ability to refine and target search results more intelligently is a great idea, and will help searchers tremendously. It may mean more work for people doing SEO, but if they are aware of these types of changes, it may benefit them as well.

    I would guess that they best way to track and understand potential query refinements is to spend a good amount of time studying search results themselves, and using alerts and tools that track trends within industries, as well as spending time interacting with people who might be interested in the goods or services or information that a site has to offer.

    It’s hard to tell if the processes in the patent from the Oingo founders will target sites that are “highly” authoritative. One of the examples of how exact the targeting their system offers is in returning a page because it contains a single sentence relevant for a search on “sharks” even though the rest of the page doesn’t, and wouldn’t particularly rank well for that query term on its own. The sentence by itself, however, is a very good match for the query. As you note, that could benefit people who pay attention to providing results that include long tail terms.

    It does seem that the evolution, not just at Google, is to include more intelligent search interfaces that show additional concepts and categories that might grasp the possibility of different meanings behind a query, and allow for more query refinements that allow searchers to dig down through search results instead of having to choose between a wide range of results that might have little to do with the intent behind their search. That’s just my quess, but I think we’re seeing some signs of that happening now.

  7. This is really good, being able to search on meanings or concepts instead of keywords, because some words would be spelled the sane way but would differ in meaning. With this, search would not be futile anymore, people would love to search because every possible connection to their query would come up including synonyms too. This is really great. This shows that Google is really thinking of the welfare of the searchers and would do everything they can to provide the best and the most useful information possible.

  8. What a fantastic insight into what could become the norm in Google’s SERPs.

    It is a logical step towards more accurate, cleaner results in the results pages on the SE’s.

    We are want cleaner SERP’s and not get spam covering many results. we are a hungry beast; we expect answers to our questions and want the best possible returns after a search.

    Looking forward to this eventually happening if it does not happen some what already.

    Bill, where do you find all this info and where do you get the time to research all the patents and possible useages?

  9. I wonder if this technology will be coupled with some type of knowledge of the personal web history of a user. If I type the question into my Firefox search box, it pulls up related questions that I have asked as suggestions. When I am searching in this form on Google, I get what appear to be standard searches that many others have conducted. If I begin typing my question with “what …” A suggestion will be “whataburger” or “what celebrity do I look like”. I like the idea behind this patent, but we may find that different users will have different intents while phrasing a query in a similar fashion.

  10. Hi Lee,

    Thank you. I think that’s the ultimate goal of the major search engines.

    Bill, where do you find all this info and where do you get the time to research all the patents and possible useages?

    I like looking at primary resources such as patent filings, as well as whitepapers and blog posts from the search engines because they provide information directly from the sources in most instances. I feel that it’s necessary for me to take time and do some research everyday, and to try to use the best sources that I possibly can as a matter of due dilligence.

    I do also subscribe to a pretty large number of SEO, marketing, design, and other blogs as well through RSS.

    Researching patent filings and white papers gives me a chance to:

    1. Have interesting and somewhat unique material to blog about
    2. Stay current, and possibly a little ahead on potential changes to the search landscape
    3. Develop and maintain a level of expertise on search related topics
    4. Find good ideas to experiment with and think about
    5. Raise questions about why things are ranked the way that they are, and why search engines do some of the things they do
    6. Incorporate meaningful ideas and changes into recommendations to clients based upon research and experimentation
    7. Use what I’ve learned in other ways as well.

    I think the research helps me to make some better decisions than I otherwise might, saves me from making some possibly harmful assumptions about possible SEO practices, and gives me the opportunity to come up with some proactive approaches to possible changes at the search engines. So the time spent researching enables me to work smarter.

  11. Hi Frank,

    It’s very much possible that some increased interactions between searcher and a search interface could help a search engine capture more data that might be helpful for personalizing search for that searcher. If there isn’t much information for an individual that could be used by a search engine to make personalized recommendations (or provide personalized search results), the search engine might back off from that individual’s history to look at the search histories of people who might be perceived as having many of the same interests as that searcher, and see what kind of queries they used, query refinements they might have made or clicked upon, and what kinds of pages they may have visited.

    I think that you’re right in questioning whether past queries and browsing history might be a good indication of present intent, regardless of whether we are discussing an individual’s prior history, or many searchers decisions on what to search for, click, and view. A good number of searches are initiated because of some situational-based need for information or requirement to perform a task that may have little to do with past history. Data collected from a process like this might not be helpful in those instances where a situational relevance is more important than one based upon some profile based upon a searcher’s implicit profile.

    The good thing is that a process like this would aim at multiple opportunities for a searcher to refine their search, targeting what they are interested in at the moment rather than what they were interested in during previous searches or browsing sessions.

    And if it were tied to personalization, while some results shown to searchers might be influenced by user data, there would likely be a level of diversification of results that would not be influenced by that information to allow searchers with a situational intent to find what they were looking for as well.

  12. This is what makes Google grab a huge chunk of the market share. They’re intelligence in searches is sometimes too much to handle. You don’t even know most of the things they use until you read about patents like this that leave you saying, “wow!”

  13. Hi Kevin,

    That’s definitely one of the reasons why I like looking through the patents published by the search engines. Yahoo and Microsoft have patented some interesting technology as well over the past few years, and it will be interesting to see some of those approaches get implemented.

  14. A prescient post, as it’s just a month later and Google Squared (or at least part of it) apparently is ready for prime time. There’s no ambiguation in the examples Google has provided so far, but clearly data is being extracted and returned by Google when judges it has correctly deciphered the intent of the query and has a high-confidence match. The equivalency of “date of birth” and “born” is conceptually the same as the o

    In the Google Squared example given in the announcement, it’s clear Squared understands the equivalency of “date of birth” and “born”, just as the Oingo “jobs” and “employment” example. This understanding is reflected in the regular SERPs, of course, they’re still returning web pages, rather than the extracted information in the Squared vertical.

    It’s interesting to me that for a number of the web sources cited (“according to”) the data isn’t structured – but they’re all sites where there’s a large set of consistently labeled page elements. If they are using an ontology in the processing of the query I’d bet that sites with structured data to compare that query against would have a much higher chance of ranking for that query.

  15. Hi Aaron,

    Thanks. The Official Google Blog post, Understanding the web to find short answers and “something different” was interesting to see. Another Googe patent on the technology behind Applied Semantics was granted last month, and I’ve been slowly making my way through it. It’s possible that there’s a connection between the meaning-based lexicon that Applied semantics brings and the indexing of facts with related attributes used behind Google Squared. Definitely worth exploring and pondering.

  16. Hi Prodigal Webmaster,

    Nice post. I’ve read the patent you’ve written about before when it was still a pending patent application, and it’s definitely one worth paying attention to.

    You definitely need to be careful how you link to those – right now you’ve linked to it as a search, which means that a different patent might start appearing after someone clicks upon your link if there are more results for that particular search in the future. Here’s a different way of linking to that patent:

    http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=1&f=G&l=50&d=PALL&S1=08024326&OS=PN/08024326&RS=PN/08024326

    or

    http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=08024326

    Using either will mean that the correct patent, “Methods and systems for improving a search ranking using related queries” shows up in the future when people click on your link.

    The paper that you cited as well, “Statistical Machine Translation for Query Expansion in Answer Retrieval” is also definitely worth spending some time with.

Comments are closed.