Search Using Structured Data
Structured Data is information that is formatted into a repository that a search engine can read easily. Some examples include XML markup in XML sitemaps and schema vocabulary found in JSON-LD scripts. It is distinct from semi-structured, and unstructured data that have less formatting.
A search engine that answers questions based upon crawling and indexing facts found within structured data on a site works differently than a search engine which looks at the words used in a query, and tries to return documents using unstructured data which contains the same words as the ones in the query; hoping that such a matching of strings might contain an actual answer to the informational need that inspired the query in the first place. Search using Structured Data works a little differently, as seen in this flowchart from a 2017 Google patent:
In Schema, Structured Data, and Scattered Databases such as the World Wide Web, I talked about the Dipre Algorithm in a patent from Sergey Brin, as I described in the post, Googleâ€™s First Semantic Search Invention was Patented in 1999. That patent and algorithm described how the web might be crawled to collect pattern and relations information about specific facts. In that case, about books. In the Google patent on structured data, we see how Google might look for factual information set out in structured data such as JSON-LD, to be able to answer queries about facts, such as, “What is a book, by Ernest Hemingway, published in 1948-1952.
Continue reading “Google Patent on Structured Data Focuses upon JSON-LD”
Visiting Seattle to Speak about Structured Data
I spoke at SMX Advanced this week on Schema markup and Structured Data, as part of an introduction to its use at Google.
I had the chance to visit Seattle, and tour some of it. I took some photos, but would like to go back sometimes and take a few more, and see more of the City.
One of the places that I did want to see was Pike Place market. It was a couple of blocks away from the Hotel I stayed at (the Marriott Waterfront.)
It is a combination fish and produce market, and is home to one of the earliest Starbucks.
I could see living near the market and shopping there regularly. It has a comfortable feel to it.
Continue reading “Schema, Structured Data, and Scattered Databases such as the World Wide Web”
Google Introduces Combined Content Results
This new patent is about “Combined content. What does that mean exactly? When Google patents talk about paid search, they refer to those paid results as “content” rather than as advertisements. This patent is about how Google might combine paid search results with organic results in certain instances.
The recent patent from Google (Combining Content with Search Results) tells us about how Google might identify when organic search results might be about specific entities, such as brands. It may also recognize when paid results are about the same brands, whether they might be products from those brands.
In the event that a set of search results contains high ranking organic results from a specific brand, and a paid search result from that same brand, the process described in the patent might allow for the creation of a combined content result of the organic result with the paid result.
Continue reading “Google to Offer Combined Content (Paid and Organic) Search Results”
PageRank Updated by Google
A popular search engine developed by Google Inc. of Mountain View, Calif. uses PageRank.RTM. as a page-quality metric for efficiently guiding the processes of web crawling, index selection, and web page ranking. Generally, the PageRank technique computes and assigns a PageRank score to each web page it encounters on the web, wherein the PageRank score serves as a measure of the relative quality of a given web page with respect to other web pages. PageRank generally ensures that important and high-quality web pages receive high PageRank scores, which enables a search engine to efficiently rank the search results based on their associated PageRank scores.
~ Producing a ranking for pages using distances in a web-link graph
A continuation patent showing PageRank updated was granted today. The original version of this PageRank patent was filed in 2006 and reminded me a lot of Yahoo’s TrustRank (which is cited by the patent’s applicants as one of a large number of documents that this new version of the patent is based upon.)
Continue reading “PageRank Updated”
How Query Streams Might be Used to Build Ontologies
What are query stream ontologies, and how might they change search?
Search engines trained us to use keywords when we searched – to try to guess what words or phrases might be the best ones to use to try to find something we are interested in. That we might have a situational or informational need to find out more about. Keywords were an important and essential part of SEO – trying to get pages to rank highly in search results for certain keywords found in queries that people would search for. SEOs still optimize pages for keywords, hoping to use a combination of information retrieval relevance scores and link-based PageRank scores, to get pages to rank highly in search results.
With Google moving towards a knowledge-based attempt to find “things” rather than “strings”, we are seeing patents that focus upon returning results that provide answers to questions in search results. One of those from January describes how query stream ontologies might be created from searcher’s queries, that can be used to respond to fact-based questions using information about attributes of entities.
There is a white paper from Google co-authored by the same people who are the inventors of this patent published around the time this patent was filed in 2014, and it is worth spending time reading through. The paper is titled, Biperpedia: An Ontology for Search Applications
Continue reading “3 Ways Query Stream Ontologies Change Search”