The World Wide Web is a vast resource for information. At the same time it is extremely distributed.
A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many different formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically.
We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author, title) pairs from the World Wide Web.
A few years ago, I presented at SES San Jose and someone asked me what they should be keeping an eye upon in SEO. I told them “named entities.” I was reminded of that conversation as I gave a talk today about named entities and other semantics.
I presented this morning at San Jose McEnery Convention Center at the Semantic Technology and Business Conference (#SemTechBiz2014).
Barbara Starr and I gave a 3 hour Tutorial on Semantic Search to an enthusiastic and engaged audience. We also discussed which might be a better name for the tutorial, “Semantic Search” (the name it had) or Semantic SEO (what do you think?).
Here’s Barbara’s presentation, which is the first half of the tutorial Thanks, Barbara – totally brilliant stuff:
On August 6th, Google announced that https was becoming a ranking signal for Google Search.
I’m not completely sure of the implications of a discovery I made earlier today yet, but I noticed at the USPTO assignment database that Google had been assigned a patent from AT&T in June, which was officially recorded on August 8th, 2014.
The patent is:
I’ve been saying for at least a couple of years that Google’s local search is a proof of concept for the search giant to use on how to find and understand entities.
With local search, Google goes out and looks for a mention of a business on the Web, especially when it it accompanied by geographic location information. It collects and gathers facts related to businesses (entities are people, places, and things) and then it clusters information about the objects it finds to make sure that those mentions across the Web are all referring to the same places.
If you start reading about local search, you’ll see people referring to the importance of consistency in how you present address information for a business, and the same thing is true for entities.
A couple of months ago, I wrote a post about a new patent from Google that was the first Google patent granted to Navneet Panda as an inventor. The patent described a complicated way for Google to judge the quality of websites, and my post was titled Is this Really the Panda Patent?. Simon Penson wrote a followup post at Moz titled The Panda Patent: Brand Mentions Are the Future of Link Building which looked at some other aspects of the patent.
On August 1st, Jayson Demers published a post to Forbes titled Implied Links, Brand Mentions And The Future Of SEO Link Building which covers a lot of the same ground as Simon’s post. I contacted an editor at Forbes and stated that the post plagiarized Simon’s post. Jayson didn’t give me any credit for my post about the patent either, but Simon did.
When Google crawls the Web to collect information about objects or entities, it also collects facts about those entities. These facts are separated into different categories or attributes associated with those entities. For example, a book may have attributes such as an author, a publisher, a year published, a web site it can call home , a genre, and more.
Identifying Entities by their Attributes
A search that includes those attributes can be used to identify the entity the attributes might be associated with.
Google was granted a patent recently that describes how those attributes could be searched within an attribute data store to find the entity. The patent shows how the process described within it might be used to answer some complex queries, and some interactive Answerbox type queries. The issue that this patent addresses can be summed up in a single question:
Years ago, I started referring to search results as recommendations, seeing how they’ve been starting to look more and more like that part of a page at Amazon that says “people who viewed this book also looked at these books.”
When someone searches at a search engine, one of the things they look for in the search results they receive are trustworthy pages (or recommendations) that look (and are) legitimate. How does a search engine deliver pages that are trustworthy?
One way to do that might be to try to boost pages in search results that the search engine feels are more trustworthy – and Google developed a version of Trust Rank to do that with. The inventor of Google’s Trust Rank (which differs from the version that Yahoo invented) is Ramanathan Guha.
As part of the regular business analysis that I do on an ongoing basis, I like to keep an eye out for acquisitions made by search engines, and look at the technology that those companies being acquired have filed patents for.
When I heard about Google’s acquisition of Skybox, I jumped to the assumption that low-level orbiting satellites might be used in a manner similar to Google’s Project Loon to spread internet access to a wider audience across the globe. Or they might be used to make Google Maps a lot better with high resolution and frequently updated satellite images.
And then I looked at the patent filings assigned to Skybox Imaging, and quashed those assumptions, or put them off as secondary reasons why Google might have purchased the satellite company.
How much of an impact might high resolution and very frequently updated satellite images have upon a business analysis?