There are a lot of Government Web sites that have made the data that they collect and compile freely available to the public. The licenses that data has been released under are described on the following Pages:
If you are considering starting a project using that kind of data, you should read the Open Data Handbook, which provides a lot in the way of details, and much more information is available on Data.gov, including a broad overview of different types of topics that data is available about, including:
A Google patent granted this week describes how Google might try to understand Entities that appear on Web pages, and how that awareness might influence the search results that the search engine shows off in search results.
An Entity is a specifically named person, place, or thing (including ideas and objects) that could be connected to other entities based upon relationships between them. Some pages may make certain Entities to be the main Subject of a page, while other may include additional information about entities that are related in some manner to those first entities. When some entities appear on pages, they may be presented in an ambiguous manner that doesn’t make them the main topic for the page they appear upon.
Entities are said to exist in a graph that connects them to other entities based upon relationships between them. For instance, Google and Bing are both Search Engines, both internet domains, both employers of many search engineers, and have CEOs, Vice Presidents, Marketing staff, headquarters, data centers, Web indexes. There are a lot of related entities that might show up on Web pages about both.
This view of Entities being related to each other, and belonging to an “Entity Graph” is very similar to what the Microsoft Patent I wrote about recently in How Bing May Expand Queries Based upon Finding Entities Within them. A number of the ideas behind how that patent works and this one are similar in that some knowledge about an entity might cause a search engine to display information about related entities.
In the Google patent “Providing Knowledge Panels With Search Results” is a reference to an earlier Google patent filing describing Knowledge Cards in depth. The patent provision is titled, “Apparatus and Method for Supplying Search Results with a knowledge Card”, and it is identified as being Patent Application No. 61/515,305, filed on Aug. 4, 2011.
This provisional patent is not linkable from the Web, otherwise I would provide a link to it.
It is supposedly “incorporated fully” into that later patent filing, but a lot of details about what a knowledge card is have been left out of the later patent filing. I wrote about that later patent in a post titled, How Google Decides What to Know in Knowledge Graph Results, but the patent specifically about knowledge cards contains information not in the later patent.
Knowledge Panel results are part of Google’s Semantic Web search results which include a mix of result types such as Direct Answers, Structured Snippets, Rich Snippets and are part of an evolution of search results happening at Google and Bing and Microsoft that go much beyond yesterday’s 10-Blue links. I’ll be following this post with one about the rich search results that show up in response to queries at Bing.
In one of those posts, I write about a paper (pdf) that the inventors of that patent co-authored which describes ways that Google was finding and extracting facts from pages to include in a repository of facts.
When someone searches the web, and asks a question such as “what is the capital of Poland” or “what is the birth date of George Washington” a web search engine such as Google may not be very helpful in providing an answer if it provides a list of web pages that might answer that query instead of an actual answer. People in the SEO community have been referring to such answers as “direct answers.”
A patent granted to Google this week describes how Google indexes data across the web, and may look to a large collection of facts (in a fact repository such as a knowledge graph) to check upon and verify such answers, so that it can deliver them with more confidence and certainty, like in the answer to the question about George Washington’s birthday shown above.
The patent tells us that some efforts to build a search engine that can “provide quick answers to factual questions have their own shortcomings.” One of these is that the answers may come from a single source, such as “a particular encyclopedia.” Why this is perceived as a shortcoming is that it is:
You are cloxacillin, a kind of medication and an entity that some people may not know a lot about, but part of a bigger class of medicines that people are familiar with. And you’re taking a visit through a search engine as someone has been recently prescribed to you, and they want to know more about you.
They copy your spelling from the bottle they got at the pharmacy. They couldn’t read the handwriting of the doctor who initially prescribed in. Good thing pharmacists are trained in reading doctors’ writing.You name is spelled out, and a press of the search box button and knowledge is on its way.
At the time, Google had a Subscribed links program, where site owners could create specialized search results based upon certain patterns of queries, that would show additional content for a searcher. For some of those, you had to log into your Google Account and subscribe to certain links to be shown special content.
Oddly, some of those specialized search results didn’t require subscriptions, and didn’t require logging in. Much like these NFL sports Scores from this weekend:
A Google patent application explores how Google may answer factual questions from unstructured Web pages and results rather than from more structured sources such as Freebase or Wikipedia. The processes described in the patent are pretty interesting, and they might be more familiar to an SEO trained audience than a Semantic Web one, like a result that ranks well because of a “query deserves freshness” approach.
They also avoid a problem for the search engines that I’ve been thinking about for weeks.