Search Using Structured Data
Structured Data is information that is set out in a way which makes it easy for a search engine to read easily. Some examples include XML markup in XML sitemaps and schema vocabulary found in JSON-LD scripts.
A search engine that answers questions based upon crawling and indexing facts found within structured data on a site works differently than a search engine which looks at the words used in a query, and tries to return documents that contain the same words as the ones in the query; hoping that such a matching of strings might contain an actual answer to the informational need that inspired the query in the first place. Search using Structured Data works a little differently, as seen in this flowchart from a 2017 Google patent:
In Schema, Structured Data, and Scattered Databases such as the World Wide Web, I talked about the Dipre Algorithm in a patent from Sergey Brin, as I described in the post, Google’s First Semantic Search Invention was Patented in 1999. That patent and algorithm described how the web might be crawled to collect pattern and relations information about specific facts. In that case, about books. In the Google patent on structured data, we see how Google might look for factual information set out in semi-structured data such as JSON-LD, to be able to answer queries about facts, such as, “What is a book, by Ernest Hemingway, published in 1948-1952.
Continue reading “Google Patent on Structured Data Focuses upon JSON-LD”
Visiting Seattle to Speak about Structured Data
I spoke at SMX Advanced this week on Schema markup and Structured Data, as part of an introduction to its use at Google.
I had the chance to visit Seattle, and tour some of it. I took some photos, but would like to go back sometimes and take a few more, and see more of the City.
One of the places that I did want to see was Pike Place market. It was a couple of blocks away from the Hotel I stayed at (the Marriott Waterfront.)
It is a combination fish and produce market, and is home to one of the earliest Starbucks.
I could see living near the market and shopping there regularly. It has a comfortable feel to it.
Continue reading “Schema, Structured Data, and Scattered Databases such as the World Wide Web”
How Query Streams Might be Used to Build Ontologies
What are query stream ontologies, and how might they change search?
Search engines trained us to use keywords when we searched – to try to guess what words or phrases might be the best ones to use to try to find something we are interested in. That we might have a situational or informational need to find out more about. Keywords were an important and essential part of SEO – trying to get pages to rank highly in search results for certain keywords found in queries that people would search for. SEOs still optimize pages for keywords, hoping to use a combination of information retrieval relevance scores and link-based PageRank scores, to get pages to rank highly in search results.
With Google moving towards a knowledge-based attempt to find “things” rather than “strings”, we are seeing patents that focus upon returning results that provide answers to questions in search results. One of those from January describes how query stream ontologies might be created from searcher’s queries, that can be used to respond to fact-based questions using information about attributes of entities.
There is a white paper from Google co-authored by the same people who are the inventors of this patent published around the time this patent was filed in 2014, and it is worth spending time reading through. The paper is titled, Biperpedia: An Ontology for Search Applications
Continue reading “3 Ways Query Stream Ontologies Change Search”
I recently bought a lemon tree and wanted to learn how to care for it. I started asking about it at Google, which provided me with other questions and answers related to caring for a lemon tree. As I clicked upon some of those, others were revealed that gave me more information that was helpful.
Last March, I wrote a post about Related Questions at Google, Google’s Related Questions Patent or ‘People Also Ask’ Questions.
Related Questions Patent Updated to Include a Question Graph
As Barry Schwartz noted recently at Search Engine Land, Google is now also showing alternative query refinements as ‘People Also Search For’ listings, in the post, Google launches a new look for ‘people also search for’ search refinements. That was enough to have me look to see if the original “Related Questions” patent was updated by Google. It was. A continuation patent was granted in June of last year, with the same name, but updated claims
Continue reading “Related Questions now use a Question Graph and are Joined by ‘People Also Search For’ Refinements”
I went to the Pubcon 2017 Conference this week in Las Vegas Nevada and gave a presentation about Semantic Search topics based upon white papers and patents from Google. My focus was on things such as Context Vectors and Phrase-Based Indexing.
I promised in social media that I would post the presentation on my blog so that I could answer questions if anyone had any.
I’ve been doing Semantic keyword research like this for years, where I’ve looked at other pages that rank well for keyword terms that I want to use, and identify phrases and terms that tend to appear upon those pages, and include them on pages that I am trying to optimize. It made a lot of sense to start doing that after reading about phrase based indexing in 2005 and later.
Some of the terms I see when I search for Semantic Keyword Research include such things as “improve your rankings,” and “conducting keyword research” and “smarter content.” I’m seeing phrases that I’m not a fan of such as “LSI Keywords” which has as much scientific credibility as Keyword Density, which is next to none. There were researchers from Bell Labs, in 1990, who wrote a white paper about Latent Semantic Indexing, which was something that was used with small (less than 10,000 documents) and static collections of documents (the web is constantly changing and hasn’t been that small for a long time.)
Continue reading “Semantic Keyword Research and Topic Models”
When Google crawls the Web, it extracts facts from content on the pages it finds as well as links on pages. How much information does it extract about facts on the Web? Microsoft showed off an object-based search about 10 years ago, in the paper, Object-Level Ranking: Bringing Order to Web Objects..
The team from Microsoft Research Asia tells us in that paper:
Existing Web search engines generally treat a whole Web page as the unit for retrieval and consuming. However, there are various kinds of objects embedded in the static Web pages or Web databases. Typical objects are products, people, papers, organizations, etc. We can imagine that if these objects can be extracted and integrated from the Web, powerful object-level search engines can be built to meet users’ information needs more precisely, especially for some specific domains.
Continue reading “Google Patents Extracting Facts from the Web”