Google Patent on Structured Data Focuses upon JSON-LD

Sharing is caring!

Search Using Structured Data

Structured Data is information that is set out in a way which makes it easy for a search engine to read easily. Some examples include XML markup in XML sitemaps and schema vocabulary found in JSON-LD scripts.

A search engine that answers questions based upon crawling and indexing facts found within structured data on a site works differently than a search engine which looks at the words used in a query, and tries to return documents that contain the same words as the ones in the query; hoping that such a matching of strings might contain an actual answer to the informational need that inspired the query in the first place. Search using Structured Data works a little differently, as seen in this flowchart from a 2017 Google patent:

Flow Chart Showing Structured Data in a Search

In Schema, Structured Data, and Scattered Databases such as the World Wide Web, I talked about the Dipre Algorithm in a patent from Sergey Brin, as I described in the post, Google’s First Semantic Search Invention was Patented in 1999. That patent and algorithm described how the web might be crawled to collect pattern and relations information about specific facts. In that case, about books. In the Google patent on structured data, we see how Google might look for factual information set out in semi-structured data such as JSON-LD, to be able to answer queries about facts, such as, “What is a book, by Ernest Hemingway, published in 1948-1952.

This newer patent tells us that it might solve that book search in this manner:

In particular, for each encoded data item associated with a given identified schema, the system searches the locations in the encoded data item identified by the schema as storing values for the specified keys to identify encoded data items that store values for the specified keys that satisfy the requirements specified in the query. For example, if the query is for semi-structured data items that have a value “Ernest Hemingway” for an “author” key and that have values in a range of “1948-1952” for a “year published” key, the system can identify encoded data items that store a value corresponding to “Ernest Hemingway” in the location identified in the schema associated with the encoded data item as storing the value for the “author” key and that store a value in the range from “1948-1952” in the location identified in the schema associated with the encoded data item as storing the value for the “year published” key. Thus, the system can identify encoded data items that satisfy the query efficiently, i.e., without searching encoded data items that do not include values for each key specified in the received query and without searching locations in the encoded data items that are not identified as storing values for the specified keys.

Structured Data and JSON-LD

It was interesting seeing Google come out with a patent about searching semi-structured data which focused upon the use of JSON-LD. We see them providing an example of JSON on one of the Google Developer’s pages at Introduction to Structured Data

As it tells us on that page:

This documentation describes which fields are required, recommended, or optional for structured data with special meaning to Google Search. Most Search structured data uses schema.org vocabulary, but you should rely on the documentation on developers.google.com as definitive for Google Search behavior, rather than the schema.org documentation. Attributes or objects not described here are not required by Google Search, even if marked as required by schema.org.

The page then points us to the Structured Data Testing Tool, to be used as you prepare pages for use with Structured Data. It also tells us that for checking on Structured Data after it has been set up, the Structured Data Report in Google Search Console can be helpful, and is what I usually look at when doing site audits.

The Schema.org website has had a lot of JSON-LD examples added to it, and it was interesting to see this patent focus upon it. As they tell us about it in the patent, it seems that they like it:

Semi-structured data is self-describing data that does not conform to a static, predefined format. For example, one semi-structured data format is JavaScript Object Notation (JSON). A JSON data item generally includes one or more JSON objects, i.e., one or more unordered sets of key/value pairs. Another example semi-structured data format is Extensible Markup Language (XML). An XML data item generally includes one or more XML elements that define values for one or more keys.

Machine Readable Extraction of Facts

I’ve used the analogy of how XML sitemaps are machine-readable, compared to HTML Sitemaps, and that is how JSON-LD shows off facts in a machine-readable way on a site, as opposed to content that is in HTML format. As the patent tells us that is the purpose of this patent:

In general, this specification describes techniques for extracting facts from collections of documents.

The patent discusses schemas that might be on a site, and key/value pairs that could be searched, and details about such a search of semi-structured data on a site:

The aspect further includes receiving a query for semi-structured data items, wherein the query specifies requirements for values for one or more keys; identifying schemas from the plurality of schemas that identify locations for values corresponding to each of the one or more keys; for each identified schema, searching the encoded data items associated with the schema to identify encoded data items that satisfy the query; and providing data identifying values from the encoded data items that satisfy the query in response to the query. Searching the encoded data items associated with the schema includes: searching, for each encoded data item associated with the schema, the locations in the encoded data item identified by the schema as storing values for the specified keys to identify whether the encoded data item stores values for the specified keys that satisfy the requirements specified in the query.

The patent providing details of the use of JSON-LD to provide a machine-readable set of facts on a site can be found here:

Storing semi-structured data
Inventors: Martin Probst
Assignee: Google Inc.
US Patent: 9,754,048
Granted: September 5, 2017
Filed: October 6, 2014

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for storing semi-structured data. One of the methods includes maintaining a plurality of schemas; receiving a first semi-structured data item; determining that the first semi-structured data item does not match any of the schemas in the plurality of schemas; and in response to determining that the first semi-structured data item does not match any of the schemas in the plurality of schemas: generating a new schema, encoding the first semi-structured data item in the first data format to generate the first new encoded data item in accordance with the new schema, storing the first new encoded data item in the data item repository, and associating the first new encoded data item with the new schema.

Take Aways on Structured Data Use

By using Structured Data such as in Schema Vocabulary in JSON-LD formatting, you make sure that you provide precise facts in key/value pairs that provide an alternative to the HTML-based content on the pages of a site. Make sure that you follow the Structured Data General Guidelines from Google when you add it to a site. That page tells us that pages that don’t follow the guidelines may not rank as highly, or may become ineligible for rich results appearing for them in Google SERPs.

And if you are optimizing a site for Google, it also helps to optimize the same site for Bing, and it is good to see that Bing seems to like JSON-LD too. It has taken a while for Bing to do that (see Aaron Bradley’s post, An Open Letter to Bing Regarding JSON-LD.) It appears that Bing has listened a little, adding some capacity to check on JSON-LD after it is deployed: Bing announces Bing AMP viewer & JSON-LD support in Bing Webmaster Tools. The Bing Markup Validator does not yet help with JSON-LD, but Bing Webmaster Tools now helps with debugging JSON-LD. I like using this Structured Data Linter myself.

Sharing is caring!

30 thoughts on “Google Patent on Structured Data Focuses upon JSON-LD”

  1. It is a little sad though, that Google uses structured data to crowdsource production of trivial content for means of monopolization instead of promoting open data and a progressive semantic web. At least that what it seems like.

  2. I bet that patent might have something to do with Carlo Strozzi and his definition of NoSQL databases (one year before, 1998). What do you think?

  3. Hello,
    This is nice post for google patent on structured data structure and having right to see you here .

  4. Hi Bill,
    Structured data has always been a good thing to rank on the Search engines. Although, thanks for sharing such a wonderful information in the post.
    Thanks for the share.
    have a good week ahead.

  5. It’s interesting to me that Google is indicating that their requirements won’t necessarily match those of schema.org. Schema is open source, it’s supported by Google, and Google is arguably its biggest consumer of data. I’m surprised that schema.org wouldn’t adjust its requirements to match up with those of Google.

  6. Hi Bob, The Schema Community mailing list (highly recommended) shows a lot of opinions and views about Schema, and some of those differ. The Google help page isn’t surprising in that it is suggesting looking to Google developer pages as a primary resource for Schema, because the Schema site is a combined effort that Google doesn’t necessarily have control over. It is interesting seeing the discussions about how Schema should be set up on the mailing list though. A lot of it is people testing out ideas and bouncing them off of other people, and it is good seeing that take place.

  7. Hi Robin,

    Thanks. I was glad that I came across this patent, and surprised that it focused upon JSON-LD as much as it did (showing Google’s preference for that form of markup.)

    YOu have a good week, too.

  8. Thank you Emirodgar,

    I will have to do more research to get a better sense of that history, and what impact it may have had. Thank you for pointing that out.

  9. Hi Pat,

    I see people like Dan Brickley and Ramanathan Guha and Richard Wallace working on Schema related items with a seriousness about the Semantic Web. We will see where it goes.

  10. Martin Probst worked on the Angular.JS project and this patent may be related to his contributions to Angular rather than to a specific technology used for search indexing.

  11. Outstanding content Bill. Working on Schema related stuff can be quite tricky sometimes and this is the reason why I’m glad that I stumbled upon this. As always, your blog never failed to amaze me. Kudos to you.

  12. Hi Micheal,

    That may be the impetus behind him filing this patent, but reading through it, it is about search indexing, rather than indexing of javascript content.

  13. Hi,
    This is nice post for google patent on structured and having right to see you here .

  14. Great information to foster new thought and creative exploration. For some, this is a great fit. This post is helpful in determining whether or not to continue or go in a different direction. Excellent information!
    As always, your blog never failed to amaze me. Kudos to you.

  15. Thanks for sharing your valueable thoughts.This is nice post for google patent on structured data structure and having right to see you here .

  16. Thanks for sharing, great information. I never seen about term “google patent on structured data” but provide us such a great information. Again thanks.

  17. Really very happy to say, your post is very interesting to read. I never stop myself to say something about it. You’re doing a great job. Keep it up

  18. Thanks for sharing the blog on the structure of content. I really searching for this data from a few days ago. It is appropriate information to write content.

  19. I have used this method in some of my projects, and i found it is not showing in the search results and when i make a study about this, i found that it may take time to show it on search engines.

  20. Hello Bill

    Thanks for sharing valuable article for Structured Data.

    Structured Data is very useful to improve rankings in Google.

    Awesome tip here for SEO folks.

    Thanks for sharing great article.

  21. Thank you so much for sharing this information, it is easy to learn while reading such articles. As, the information you provide is easy to understand and also, gives a detail explanation of the topic. I would love to read more of your articles because they provide great knowledge that no one else provides.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.