Does a search engine work better if it can figure out whether or not a search query is a name?
The folks at Ask.com appear to think so, and even want to know if the name is that of someone famous. I’m not sure how they measure fame, but they have a method for flagging names of the famous, as well as names that look like names, and names that really aren’t names (Brandy Alexander, anyone?)
The process is described in a patent application from Ask, and details how they might go about figuring out whether “Usher” or “50 Cent” or “Attila the Hun” refer to people, or to something else completely.
Systems and methods for predicting if a query is a name
Invented by Eric J. Glover, Apostolos Gerasoulis and Vadim Bich
US Patent Application 20070239735
Published October 11, 2007
Filed: April 5, 2006
Continue reading “Ask.com on Flagging Famous People”
Fact extraction is growing as a method that search engines can use to identify and understand what pages on the website are about and to collect facts about document subjects and answer questions posed by people submitting queries to a search engine.
A recent paper from Google provides a nice overview of some methods being used for fact extraction. A Google patent application published last week explores looking at titles on pages, and anchor text in related pages on the same domain to determine document subjects for pages.
The paper is Corroborate and Learn Facts from the Web (pdf), and the process described within it is has been called GRAZER. Here’s a little about how it works:
It starts with facts imported from one website and takes them as known facts (seed facts). Then it tries to find mentions of the seed facts on other web sites. This involves retrieving relevant pages for each entity and then corroborates facts in them.
Continue reading “Google on Determining Document Subjects”
When we talk about how a search engine like Google crawls and indexes information from websites, it’s often in the context of the Web results that the search engine shows to searchers.
Facts in Web Results
But, with Universal Search and blended search results showing information from local search, question answering, definitions, and others, it may make sense to start paying more attention to how the search engine is extracting facts from pages, creating “objects” from those facts, and ranking those objects.
In a post from last September, I went into a lot of detail on how a Google patent application focusing upon data practices with Local Search, titled Generating Structured Information, discussed how facts and information were taken from the Web and included in a local search repository.
Explosion of Patent Filings
Continue reading “Google & Fact Extraction, Normalization, and Visualization”
How Might Google Handle Data Visualization?
Yesterday, I wrote about how Google might present facts extracted from pages in timelines or maps, according to a patent application filed last week.
It wasn’t the only piece of intellectual property coming out of the US Patent and Trademark Office for Google on the extraction and visualization of facts. Another that maybe even more interesting describes the possibility of a user extracting facts found in a query of the fact database, and choosing to present those facts in a number of ways.
Designating data objects for analysis
Invented by Andrew W. Hogue, David J. Vespe, Alexander Kehlenbeck, Michael Gordon, Jeffrey C. Reynar, and David B. Alpert
US Patent Application 20070179965
Published August 2, 2007
Filed: January 27, 2006
Continue reading “Google on Data Visualization”
Sometimes a list of search results isn’t always the best way to present information found in a search.
Google has recently come up with a couple of other interesting ways to show results related to a query, that might make you reconsider how you present dates and addresses on the pages of your website.
A map pointing out different facts related to a query might provide some interesting results:
Likewise, a timeline could show you some things that you might not expect to see from a search engine, especially if the facts used in response to the query came from different web pages:
Continue reading “Google Timelines, Fact Maps, and Fact Relevance Rankings”
Google described some of the Janitors it uses to crawl facts on the Web in a recent patent application.
Google has been working on extracting data from a wide variety of sources on the Web, but there are problems with a lot of that information. Some examples:
One site may use a certain format to present information, while other pages use different formats.
Information from one web page may contradict information from others.
Some data may become old and stale.
When Google crawls facts to collect this kind of information, a lot of it needs to be cleaned up, and Google’s “Janitors” spring into action to do that.
Continue reading “How Google Crawls Facts on the Web”