Search Based upon Concepts: Applied Semantics and Google
A recently granted Google patent from the founders of Applied Semantics discusses a search interface that could help searchers find web pages based upon the meanings of their queries rather than just pages that include those keywords.
In the late 90s, Adam Weissman and Gilad Elbaz decided to start a search engine that would search on meanings or concepts instead of keywords. Along with a few friends and family, they formed a company named Oingo, and along the way filed for a patent on a search based upon meanings rather than keywords.
The technology they developed could be used in a number of ways in addition to search, and provided an interesting alternative to keyword based search that would lead to some significant developments in the world of search engines.
Oingo Changes Directions
Roughly around the same time that Oingo was developing their technology, Google launched, focusing upon matching keywords in queries, and started catching on with the public. Eytan Elbaz, another Oingo founder, noted in a 2008 interview:
In 1999, “my older brother and his friend and a cousin came up with this idea for Oingo, a meaning-based search engine. We came up with this idea for making search engines better so you’d search based on meaning rather than text.
We went down that path for a year when it occurred to us that people didn’t want to search like this, they wanted to search on Google.
“We decided to shift gears and focus more on internet ads and contextualize target using the same technology.”
While Oingo didn’t offer search directly to the public, they started out creating search tools that people could use for free on their sites, and had some success in building a search that worked with the Open Directory Project. They diversified, and developed and filed for a patent on a meaning or concept-based advertising system.
They also started building enterprise level tools, and the company changed its name in May of 2001 to Applied Semantics, to “better reflect Oingo’s altered business model.”
They published a number of whitepapers that describe the CIRCA technology (Conceptual Information Retrieval and Communication Architecture) that they developed, which focuses upon understanding conceptual information behind strings of text in a number of information management applications.
- CIRCA Technology Overview (pdf)
- CIRCA Technology: Applying Meaning to Information Management (pdf)
- Ontology Usage and Applications (pdf)
Applied Semantics also launched Adsense in 2002, using their CIRCA technology to understand and extract the central concepts that appear on pages to deliver advertising that matched the context of the pages those ads appeared upon.
Google and Applied Semantics Merge
In 2003, Google merged with Applied Semantics.
The most obvious result of that merger is Google’s use of the technology in their Adsense offering, but there may be more to Google’s use of the CIRCLA technology behind the scenes. The two Applied Semantics patents linked above have long been reassigned to Google. And a patent filed a few months after the merger, listing Adam Weissman and Gilad Elbaz has just been granted this week.
What makes the Web search techology developed by Oingo different than the keyword matching approach initially used by Google? From one of the original pages on Oingo.com:
The value of Oingo’s meaning-based method goes well beyond the mere filtering of irrelevant results.
The true power of our technology is demonstrated with a query such as “shopping for fishing gear”. Once exhausting a search for Web sites containing all three words, a traditional text-based search engine resorts to looking for two-of-three word matches. This search could yield results about shopping and gear, but having nothing at all to do with fishing, or conversely about fishing gear, but nothing to do with shopping!
An Oingo meaning-based search does not give up so easily; it essentially tries hundreds of possible combinations of related terms before giving up on finding information related to all three concepts. Consider the following examples of highly relevant results for this query: “Buying Fishing Equipment Online”, “Fishing Equipment Retail Shops”, and “Gifts for Fishing Enthusiasts”.
A traditional text-based search cannot possibly see the high relevancy of these results unless all three of the specific search words just happen to appear together on the page.
Because of this, traditional text-based search results can seem arbitrary, even random, at times. Replace the word “employment” with “jobs”, for example, and you will typically get entirely different results. By searching on meanings, instead of just words, our search eliminates this “randomness” of results.
The newly granted patent from Google describes a different way of searching, in which a searcher not only enters search terms, but through a search interface that can present additional options after the initial query, enabling searchers to choose between different concepts that the initial query terms might be included under, and define other aspects of a search that focuses more upon meaning and concepts than the keywords themselves.
Less Keyword Matching, More Attempts to Understand Meaning?
You may have noticed that more and more searches at Google, Yahoo, and Bing offer not only search results but also query refinement suggestions, and in the case of Bing, sometimes choices of results from different categories that may be associated with a query term.
The predictive results in the dropdown under the search boxes at those search engines also attempt to guess at the intent behind a search as well.
Google’s announcement of a couple of months ago about how they might expand queries typed in by searchers to include synonyms is also a move away from strict keyword matching to results that might more closely match the meaning behind a search rather than just the words.
The patent is at:
Methods and systems for detecting and extracting information
Invented by Adam J. Weissman and Gilad Israel Elbaz
Assigned to Google
US Patent 7,689,536
Granted March 30, 2010
Filed: December 18, 2003
Systems and methods that detect information and extract information are described.
In one aspect, target rules are defined for detection of target hits in an article, including defining a target article region, extraction rules are defined based on the target rules for the extraction of extracts from the article, including an extraction article region, target rules are applied to each target article region of the article to determine target hits, and extraction rules are applied to detect at least one extract from the article based on the determined target hit.
The patent may hint at additional search interfaces that Google might show in the future that are intended to help searchers find better matches for the meaning behind their query terms rather than just good matches for the keywords used in their search.