Search Using Structured Data
Structured Data is information that is formatted into a repository that a search engine can read easily. Some examples include XML markup in XML sitemaps and schema vocabulary found in JSON-LD scripts. It is distinct from semi-structured, and unstructured data that have less formatting.
A search engine that answers questions based upon crawling and indexing facts found within structured data on a site works differently than a search engine which looks at the words used in a query, and tries to return documents using unstructured data which contains the same words as the ones in the query; hoping that such a matching of strings might contain an actual answer to the informational need that inspired the query in the first place. Search using Structured Data works a little differently, as seen in this flowchart from a 2017 Google patent:
In Schema, Structured Data, and Scattered Databases such as the World Wide Web, I talked about the Dipre Algorithm in a patent from Sergey Brin, as I described in the post, Google’s First Semantic Search Invention was Patented in 1999. That patent and algorithm described how the web might be crawled to collect pattern and relations information about specific facts. In that case, about books. In the Google patent on structured data, we see how Google might look for factual information set out in structured data such as JSON-LD, to be able to answer queries about facts, such as, “What is a book, by Ernest Hemingway, published in 1948-1952.
Continue reading “Google Patent on Structured Data Focuses upon JSON-LD”
Visiting Seattle to Speak about Structured Data
I spoke at SMX Advanced this week on Schema markup and Structured Data, as part of an introduction to its use at Google.
I had the chance to visit Seattle, and tour some of it. I took some photos, but would like to go back sometimes and take a few more, and see more of the City.
One of the places that I did want to see was Pike Place market. It was a couple of blocks away from the Hotel I stayed at (the Marriott Waterfront.)
It is a combination fish and produce market, and is home to one of the earliest Starbucks.
I could see living near the market and shopping there regularly. It has a comfortable feel to it.
Continue reading “Schema, Structured Data, and Scattered Databases such as the World Wide Web”
I recently bought a lemon tree and wanted to learn how to care for it. I started asking about it at Google, which provided me with other questions and answers related to caring for a lemon tree. As I clicked upon some of those, others were revealed that gave me more information that was helpful.
Last March, I wrote a post about Related Questions at Google, Google’s Related Questions Patent or ‘People Also Ask’ Questions.
Related Questions Patent Updated to Include a Question Graph
As Barry Schwartz noted recently at Search Engine Land, Google is now also showing alternative query refinements as ‘People Also Search For’ listings, in the post, Google launches a new look for ‘people also search for’ search refinements. That was enough to have me look to see if the original “Related Questions” patent was updated by Google. It was. A continuation patent was granted in June of last year, with the same name, but updated claims
Continue reading “Related Questions now use a Question Graph and are Joined by ‘People Also Search For’ Refinements”
I went to the Pubcon 2017 Conference this week in Las Vegas Nevada and gave a presentation about Semantic Search topics based upon white papers and patents from Google. My focus was on things such as Context Vectors and Phrase-Based Indexing.
I promised in social media that I would post the presentation on my blog so that I could answer questions if anyone had any.
I’ve been doing Semantic keyword research like this for years, where I’ve looked at other pages that rank well for keyword terms that I want to use, and identify phrases and terms that tend to appear upon those pages, and include them on pages that I am trying to optimize. It made a lot of sense to start doing that after reading about phrase based indexing in 2005 and later.
Some of the terms I see when I search for Semantic Keyword Research include such things as “improve your rankings,” and “conducting keyword research” and “smarter content.” I’m seeing phrases that I’m not a fan of such as “LSI Keywords” which has as much scientific credibility as Keyword Density, which is next to none. There were researchers from Bell Labs, in 1990, who wrote a white paper about Latent Semantic Indexing, which was something that was used with small (less than 10,000 documents) and static collections of documents (the web is constantly changing and hasn’t been that small for a long time.)
Continue reading “Semantic Keyword Research and Topic Models”
When Google crawls the Web, it extracts facts from content on the pages it finds as well as links on pages. How much information does it extract about facts on the Web? Microsoft showed off an object-based search about 10 years ago, in the paper, Object-Level Ranking: Bringing Order to Web Objects..
The team from Microsoft Research Asia tells us in that paper:
Existing Web search engines generally treat a whole Web page as the unit for retrieval and consuming. However, there are various kinds of objects embedded in the static Web pages or Web databases. Typical objects are products, people, papers, organizations, etc. We can imagine that if these objects can be extracted and integrated from the Web, powerful object-level search engines can be built to meet users’ information needs more precisely, especially for some specific domains.
Continue reading “Google Patents Extracting Facts from the Web”
I noticed a blog post published yesterday, November 2, 2016, and it looked helpful: Use JSON-LD to add Schema.org to your Website. Schema and structured data seem to be growing in importance on the Web, as we see more knowledge panels and rich snippets and product search results. I’ve been working Knowledge Panel into Site Audits. JSON-LD seems to be favored by Google in adding structured data on your web pages. See: What is JSON-LD? A Talk with Gregg Kellogg.
If you do SEO and aren’t familiar with GS1, you probably should be. They invented the use of bar codes in shopping. They also came up with GTINS (Global Trade Item Numbers) which are used online at places such as eBay and Amazon, and Google Product Search. A recent blog post by GS1 Vice President Rich Richardson is also worth reading: Why bar code numbers matter.
In February, GS1 published an extension to Schema for products. Extensions like this are how Search and SEO are growing. The Schema blog told us about it in:
Continue reading “GS1 Web Vocabulary Schema Workshops in California”