Category Archives: Fact Extraction and Knowledge Graphs

Techniques and approaches that search engines might use to extract facts and information from the Web, as uncovered in search-related patents and whitepapers.

Entity Associations with Websites and Related Entities

When we talk about how web sites are related, it’s not unusual for us to talk about links between sites and pages. Google pays a lot of attention between such links, and they are at the heart of one of its most well known ranking signal – PageRank. PageRank is now more than 15 years old, predating the origin of Google itself in the BackRub search engine.

Google is exploring other signals that may be used to rank pages in search results, including social signals that may result in reputation scores for authors, in relationships between words that might appear together on pages ranking for the same queries, and in relationships between pages that show up in the same search results and in the same search sessions. The Google paper presented at an October 2013 natural language processing conference, Open-Domain Fine-Grained Class Extraction from Web Search Queries (pdf), provides some interesting hints at a possible Google of the future.

Google also seems to be very interested in building a knowledge base of concepts that better understands things like what different businesses or entities are ‘Known for’ or by defining entities better in ‘is a’ relationships. Sometimes pages for specific entities show up at the top of search results because they seem to be the page that people are looking for when they include that entity within a query, like the first two results on a search for [Roald Dahl], as seen in the image below:

Search results showing authoritative results for Roald Dahl and then results for books he wrote.

Continue reading

How Google Finds ‘Known For’ Terms for Entities

Google finds terms and phrases to associate with entities that can be considered terms of interest for businesses, locations, and other entities. These terms can influence what shows up in search results and in knowledge panels for those entities. Consider it part of a growing knowledge base of concepts, entities, attributes for entities, and keywords that shape the new Google after Hummingbird. Semantics play a role as things that specific entities are known for are identified.

The Red Truck Bakery in Warrenton, Virginia

For example, the Warrenton, Virginia, Red Truck Bakery (local to me) is known for:

Continue reading

How Google Decides What to Know in Knowledge Graph Results

A transformation was triggered at Google with their announcement of the Knowledge Graph in the Official Google Blog post, Introducing the Knowledge Graph: things, not strings. That transformation was one less concerned with matching keywords, and more concerned with matching concepts, understanding entities, and bringing knowledge about entities to searchers in knowledge panels next to search results.

Google published a patent application last week that describes the knowledge panels that appear next to search results as part of the new knowledge graph. Here’s the video that accompanied the post (note the reference to a “panel” in the presentation):

Continue reading

Building Google’s Knowledge Base and Identifying Locations in Web Pages

When we talk about indexing and crawling content on the Web, it’s usually within the context of pages being ranked on the basis of a number of signals found on Web pages that might be ranked in response to queries. Google has told us that the future of search involves Knowledge Bases, and the indexing of Things, Not Strings. Gianluca Fiorelli explored Google’s ideas of Search in the Knowledge Graph Era earlier this week.

A few years back, I wrote some posts about some Google Patents that explored how Google might be extracting and visualizing facts, and using Data Janitors to process that information and clean it up and sort it. Google was granted another patent this week that’s very much related, looking at how Google might understand locations for places collected from Web pages. One of the inventors, Andrew Hogue, gave this Google Tech Talk presentation last year:

Continue reading

Search Engines and the Most Popular Search Terms

When you walk into the lobby of Building 42 at the Googleplex, you can see a display that shows you queries entered into the search engine at any one time. It’s a mesmerizing sight, and I found myself wondering about the people and motivations behind some of the search terms I saw flowing down the screen.

Imagine that instead of seeing one query at a time, that search information was analyzed, and queries were bundled together, to maybe provide us with more meaning.

Can search engines be used to tell us what the world is thinking at anyone time? Would looking at the most popular keywords or queries that people type into a search engine provide us with some insights?

Popular Search Information from Search Engines

Continue reading

How Google Sets Works

A tool from Google that is often overlooked is Google Sets (no longer available), which allows you to “automatically create sets of items from a few examples.”

Google Sets was one of the first applications in the Google Labs (no longer available) pages.

Those pages are “Google’s Technology Playground,” and contain a number of programs that may or may not be tomorrow’s useful applications from the search engine. As Google tells us,

Google labs showcases a few of our favorite ideas that aren’t quite ready for prime time. Your feedback can help us improve them. Please play with these prototypes and send your comments directly to the Googlers who developed them.

Google was granted a patent this week on the process behind Google Sets, and the patent document provides some details on how the program finds additional words based on “items from a set of things” that you enter.

Continue reading

The Oracle at Yahoo: Using Yahoo News to Search the Future

Imagine exploring millions and millions of news pages and other documents to find information about events that are scheduled to happen in the future, to help predict the future.

The oracle Sibyl at Delphi

This kind of future search, or future retrieval, might be able to support the making of decisions in many different fields.

News information could be used to obtain information about possible future events, and that information could be made searchable, so that it can help people plan for the future.

The Yahoo patent application is:

Continue reading

How Google May Blend Information From Feeds and Extracted Data For Search Results

In Google’s search results, depending upon your query, when and where you are searching, and what your browser and search engine settings might be, you may receive a different set of search results than other folks performing a search using the same query terms.

And those results may include a mix of links and images from different data sources including Web results, images, advertisments, local business, books, products, and others.

Google’s Universal Search provides a blended mix of results which incorporate results from a number of different data respositories all together into search results.

While ads are usually segmented from other results, the remainder may be mixed together upon results pages. David Bailey, on the Official Google Blog, provided a glimpse of how those results came to be blended together in Behind the scenes with universal search. He provided an even more detailed view in a guest post at Search Engine Land titled An Insider’s View Of Google Universal Search

Continue reading