Google on Determining Document Subjects

Fact extraction is growing as a method that search engines can use to identify and understand what pages on the website are about and to collect facts about document subjects and answer questions posed by people submitting queries to a search engine.

A recent paper from Google provides a nice overview of some methods being used for fact extraction. A Google patent application published last week explores looking at titles on pages, and anchor text in related pages on the same domain to determine document subjects for pages.

The paper is Corroborate and Learn Facts from the Web (pdf), and the process described within it is has been called GRAZER. Here’s a little about how it works:

It starts with facts imported from one website and takes them as known facts (seed facts). Then it tries to find mentions of the seed facts on other web sites. This involves retrieving relevant pages for each entity and then corroborates facts in them.

Continue reading “Google on Determining Document Subjects”

Google & Fact Extraction, Normalization, and Visualization

When we talk about how a search engine like Google crawls and indexes information from websites, it’s often in the context of the Web results that the search engine shows to searchers.

Facts in Web Results

But, with Universal Search and blended search results showing information from local search, question answering, definitions, and others, it may make sense to start paying more attention to how the search engine is extracting facts from pages, creating “objects” from those facts, and ranking those objects.

In a post from last September, I went into a lot of detail on how a Google patent application focusing upon data practices with Local Search, titled Generating Structured Information, discussed how facts and information were taken from the Web and included in a local search repository.

Explosion of Patent Filings

Continue reading “Google & Fact Extraction, Normalization, and Visualization”

Google on the Extraction and Visualization of Facts

Yesterday, I wrote about how Google might present facts extracted from pages in timelines or maps, according to patent application filed last week.

It wasn’t the only piece of intellectual property coming out of the US Patent and Trademark Office for Google on the extraction and visualization of facts. Another that maybe even more interesting describes the possibility of a user extracting facts found in a query of the fact database, and choosing to present those facts in a number of ways.

Designating data objects for analysis
Invented by Andrew W. Hogue, David J. Vespe, Alexander Kehlenbeck, Michael Gordon, Jeffrey C. Reynar, and David B. Alpert
US Patent Application 20070179965
Published August 2, 2007
Filed: January 27, 2006

Abstract

Continue reading “Google on the Extraction and Visualization of Facts”

Google Timelines, Fact Maps, and Fact Relevance Rankings

Sometimes a list of search results isn’t always the best way to present information found in a search.

Google has recently come up with a couple of other interesting ways to show results related to a query, that might make you reconsider how you present dates and addresses on the pages of your website.

A map pointing out different facts related to a query might provide some interesting results:

A Map of Facts From a Google Patent Application

Likewise, a timeline could show you some things that you might not expect to see from a search engine, especially if the facts used in response to the query came from different web pages:

Continue reading “Google Timelines, Fact Maps, and Fact Relevance Rankings”

Google Janitors Clean Up Facts on the Web

Google described some of its janitors in a recent patent application.

Google has been working on extracting data from a wide variety of sources on the Web, but there are problems with a lot of that information. Some examples:

One site may use a certain format to present information, while other pages use different formats.

Information from one web page may contradict information from others.

Some data may become old and stale.

When Google collects this kind of information, a lot of it needs to be cleaned up, and Google’s “Janitors” spring into action to do that.

Continue reading “Google Janitors Clean Up Facts on the Web”

Google Q&A Patent Applications?

Depending upon how you may phrase a query, Google will sometimes jump into a Q&A (question and answer) mode. This is described on one of Google’s help pages:

Want to know the population of Japan? What currency is used in Algeria? The birthplace of Bono? Hit us with a fact-based question or query (like “population of Japan”) by typing it into the Google search box. We’ll search the web and display the answer at the top of your search results page. We also link to our source for this information so that you can learn even more.

Google Web Search Features

The same feature is available through a moble phone service from Google, a little more straightforward in that you know you are only in Q&A mode when asking a question:

Continue reading “Google Q&A Patent Applications?”