Author Markup, Schema.org and Patents, Oh My!

Google, Yahoo, and Bing have joined forces to enable web publishers to include additional HTML that adds more structure to their pages, and possibly makes those pages easier to index and may provide them with a little more control over what may show up in search results for pages. There’s some controversy over the approach, some questions about the impact of related patents that all three search engines have been granted, and web publishers should be paying attention to the possible impacts of this initiative from the search giants.

Google’s Author Markup

Yesterday, Google announced that they were introducing a way to add HTML code to a page to indicate who the author of the page might be. This code would appear as part of a link pointing to an author’s page on the same site, so that a search engine might associate the content of that page with the author who wrote it. The announcement was made in the Google Inside Search blog, in the post Authorship markup and web search, which told us how Google would use rel=”author” and rel=”me” to learn about who may have authored what on the Web.

In some ways, this announcement reminded me of a possible approach to understanding who wrote what in Google patent filing I wrote a few years back at Search Engine Land on Agent Rank (<a href=”http://searchengineland.com/googles-agent-rank-patent-application-10487″ rel=”me”>Agent Rank</a>), which combines digital signatures for authors with meta data that would allow them to indicate that they were the authors of specific content on a page, whether a main content area blog post or article, or a blog comment, or even an advertisement.

To indicate that someone is the author of an article on a page, they would use a rel=”author” in a link to an author’s page on the same site. This is presently part of HTML5, and there’s a little more about how it can be implemented on the Web Hypertext Application Technology group’s pages. The group’s spokesperson is Ian Hickson, who works at Google on Web Standards development.

Here’s one example of how you might use rel=”author” to indicate who the author of an article might be. At the bottom of an article on “example.com”, you might include a link like this:

Written by <a href=”http://www.example.com/profiles/author-name” rel=”author”>Author Name</a>

It’s also possible to provide information through HTML markup to the Google across sites about the author of a page, or that an author’s profile page on one site is associated with another site, through the XFN rel=”me” attribute and value. For instance, I wrote a number of posts in the past at Search Engine land, and I might link to a page on that site that collects links to all of those posts in one place using a rel=”me” attribute and value, like this:

My posts at <a href=”http://searchengineland.com/author/bill-slawski” rel=”me”>Search Engine Land</a>

I might also point from that Search Engine Land profile page to my site using a link like this:

Read more about <a href=”http://www.seobythesea.com/” rel=”me”>Bill Slawski on SEO by the Sea</a>

One of the things that I liked about the Agent Rank approach that I referred to above was that it associated the ownership and authorship of content with digital signatures, to make it more likely that a person claiming ownership of specific content was actually the owner of that content. This digital signature could also be used in places like blog comments, so that the owner of a comment on someone else’s blog could be understood by a search engine to be the author of that content. The Agent Rank approach also provided a way to indicate using metadata that content syndicated elsewhere was done so with the knowledge and permission of the original author.

Google provides more information and examples on their Authorship support page.

Microdata and Schema.org

It’s big news when the major search engines join together to provide site owners with ways to make it easier for the content on their sites to be indexed easier. We saw that happen in 2005 with rel=”nofollow, and in 2006 with a joint initiative on XML Sitemaps in Sitemaps.org.

In the Google blog post on authorship markup, by Google Software Engineer Othar Hansson (keep that name in mind for the section on patents below), we are pointed to a Google Blog post from June 2, 2011, about another joint initiative from Google, Yahoo! and Microsoft on other markup that can be included on Web pages to help the search engines understand the content on your web pages better. This markup uses Microdata to help search engines learn more about the content of your pages, including templates that can be used for different types of information. That can be found on the site, Schema.org

Google’s announcement about the initiative is at: Introducing schema.org: Search engines come together for a richer web. Yahoo introduces the topic on the Yahoo Developer Network at: Introducing schema.org: A Collaboration on Structured Data. Bing’s writeup can be found at: Introducing Schema.org: Bing, Google and Yahoo Unite to Build the Web of Objects.

There are actually a number of different standards in development that enable web publishers to include meta data about their page content other than Microdata, including the Resource Description Framework (RDF), and microformats, each of which has their own strengths. It’s not completely clear why schema.org chose to focus upon the use of microdata rather than the others, though the FAQ page for schema.org provides an explanation of why they decided to start with Microformats in answering the question, “Q: Why microdata? Why not RDFa or microformats“?

I’ve seen more than a couple of blog posts about the choice to focus upon Microformats rather than the other developing standards, and a number of those provide some interesting criticisms of the choice. The Schema.org answer includes this point:

Microdata is the most recent well-known standard, created along with HTML5. It strikes a balance between extensibility and simplicity, and is most suitable for building the schema.org.

We’re also told that Google and Yahoo! will continue the support that they’ve had in the past for microformats and RDFa for certain applications in the past, and that the search engines will be keeping an eye on usage of the other standards, and may look into supporting them as well if they become more popular.

I have a few questions when it comes to Schema.org:

  • Will the adoption of the Microdata format by the major search engines harm the development of the competing formats, and is the Web a little less rich because of it?
  • Will a resource like schema.org make it easier for site owners to adopt and use a standard that they might not have otherwise used?
  • Will search become better because site owners are making it easier for search engines to index content?
  • Will this metadata approach benefit people who are more technically proficient and might not have any trouble implementing it, at the cost of indexing content that might be more relevant and meaningful but which doesn’t use microdata?

I’m not sure of the answers to these questions, but I have seen a number of opinions expressed about them on the Web.

Patents, Oh My!

On the Terms of Service page at Schema.org, there’s a statement about patents that Google, Yahoo, and Microsoft might have regarding “markup of structured data.” Not quite sure what this means to people who might want to use the templates and formats at schema.org, or build applications or tools to make it easier for others to do so. Here’s the statement from the page:

In addition, if the Sponsors have patent claims that are necessarily infringed by including markup of structured data in a webpage, where the markup is based on and strictly complies with the Schema, they grant an option to receive a license under reasonable and non-discriminatory terms without royalty, solely for the purpose of including markup of structured data in a webpage, where the markup is based on and strictly complies with the Schema.

Does that make it sound like there might be a problem if someone comes out with a tool to make it easier for people to use the “schemas” at schema.org? Honestly, I’m not sure.

Do Google, Yahoo!, or Microsoft have any relevant patents? A quick search through the USPTO database tells us that they each have at least one, if not more. I didn’t do a comprehensive search, and it’s possible that each of the search engines have pending patent applications that aren’t published yet as well.

Google

Google’s patent, co-authored by Othar Hansson, provides a pretty detailed description of how markup language can be used to display search results, using templates to structure content found on the pages of a site. It provides some examples of how templates can be used, including the possibility that more than one template type (local vs. review) might be used on a single page:

The same web page content (e.g., the same search result display objects) can be rendered differently as search results based on the template used to render the search results. For instance, a local-business listings site might use a template specific for restaurants, which could include a “summary” field from the restaurant itself (“best sushi in Long Beach”), whereas a restaurant review site might use the a template specific for restaurant reviews, which could include a “summary” field providing content from a reviewer.

Thus, although much of the content on the web pages is the same (e.g., address, telephone number, hours, etc., of the restaurant), the templates used to render the search results cause the results to appear different to the user that submitted the search query.

Here’s a screenshot from the Google patent that shows a specific template in use, along with a link to a specific template type:

A screenshot of the format for HTML code that might be used in a template for businesses.

That patent is:

Providing Search Results
Invented by Othar Hansson, Ramananthan V. Guha, Walton W. Lin, Nicholas B. Weininger, Paul Haahr, and Kavi J. Goel
Assigned to Google
US Patent Application 20100114874
Published May 6, 2010
Filed: October 20, 2008

Abstract

Methods, systems, and apparatus, including computer program products, for responding to a search query received from a user. From a web page a search result display object and template are identified.

The search result display object specifies content available for display in a search result, and the template renders at least some of the content in the search result.

The search result is presented responsive to a search query received from a user, where the search result is associated with the web page containing the search result display object and template.

Yahoo

Yahoo! was granted a patent yesterday (June 7, 2011) that describes different templates that could be used to display results based upon different intents behind searches. Examples included in the description section of the patent include a range of template types.

  • A “Product” template may include values for original price, sale price, and date of a sale. A “Book” search result template could include information such as the author and whether the book was on a best sellers list.
  • A “Local” template can include values for address and phone number.
  • A “Reviews” template may include the date the review was published as well as a description of the review system.
  • An “Events” template may provide the date, address, and description of the event.
  • A “Discussion” template may tell us when a message was posted, how many replies where made, and a description of the message.

The Yahoo patent images include examples of how pages using these different types of templates might be displayed in search results:

A screenshot of yahoo search results from pages using different formatting templates.

Different types of shopping templates might be created for different types of businesses, such as one for consumer electronics stores, another for hotels, and a different one for airline travel.

Interestingly, the Yahoo patent focuses upon the use of RDF rather than Microdata.

Intent driven search result rich abstracts
Invented by Yi-An Lin, Youssef Billawala, Kevin Haas, Jan Pfeifer
Assigned to Yahoo!
US Patent 7,958,109
Granted June 7, 2011
Filed: February 6, 2009

Abstract

Techniques for providing useful information to a user in response to a search query are provided. Based on the search query, one or more potential intents of the user are identified and a plurality of matching resources are identified. For at least one matching resource, a particular abstract template is selected based on the one or more potential intents.

Each abstract (a) corresponds to a different intent than any other intent to which any other abstract template of the plurality of abstract templates corresponds, and (b) dictates a different manner of displaying information about a matching resource than any other manner of displaying dictated by any other abstract template of the plurality of abstract templates.

A search results page is generated and sent to the user. The search results page includes an abstract for the at least one matching resource. The abstract is displayed based on the particular abstract template.

Microsoft

Microsoft’s patent doesn’t provide the detailed types of examples that both the Google and Yahoo patents do, but it does give us a fairly broad explanation behind why they would patent a way for developers to be able to be involved in how search results might be formatted:

Accordingly, a system and method are needed for allowing customization of web search result descriptions by consumers including users and developers. A solution can be created by leveraging existing technologies such as XML, HTML, metatags, and indexing. By creating such a solution, a search platform can be created that is capable of expansion and modification. Such a system would improve the search experience for consumers of search results including users and developers.

System and method for customization of search results
Invented by Ramez Naam
Assigned to Microsoft
US Patent 7,725,449
Granted May 25, 2010
Filed: December 2, 2004

Abstract

A system and method are provided for customizing search result descriptions for results returned by a search engine. The search result descriptions may be obtained through a search over a computer network. The system includes a search result description request component for enabling selection of particular data for retrieval by the search engine.

The system additionally includes a search result description generator for retrieving and returning the requested data. The system also includes a search result description renderer for displaying search result descriptions in a selected manner.

Conclusion

It’s possible to still create and publish web pages that don’t use microdata formats and have your pages rank well in search engines, but formats like the ones offered at Schema.org might help you have a little more control over how your search results appear, and may make it easier for the search engines to understand and index the content that you would like them to index.

I raised a few questions about the choice of the Microdata format by the search engines, and it’s a little surprising that Yahoo agreed to Microdata rather than RDF given that their patent focuses upon the use of RDF where the Google and Microsoft patents don’t pinpoint a specific format. Perhaps this may provide Yahoo with some incentive to have RDF templates added to Schema.org sooner rather than later?

There are a number of templates available for types of people, products, events, businesses, and things already at Schema.org, but I’m sure that more could and probably should be added. There’s a description of an Extension Mechanism that could be followed to add to schemas, but there might be some issues with whether or not the search engines will use those. We’re told on that extension page:

Of course, you can always create new schemas that are not at all tied to those on schema.org, and you should do this if the content of your domain is not covered by any of the schema.org types. As soon as your schemas gain sufficient adoption, search engines will start using their data.

Share

34 thoughts on “Author Markup, Schema.org and Patents, Oh My!”

  1. Well, thats what i call it as bill’s work. Excellent once again. well.. really loved your approach. Got nothing else to say, you made it little bit easier for me :) ;) . Thanks man.

  2. Thank you for this post and new info. I looked at schema.org to see what it is.

    Alex J

  3. I am always satisfied with your work Bill, I am really impressed on hove you combine words and come up with a very interesting and useful material.

  4. It has the makings of a cartel, if you ask me. Microdata has been developed for years by the W3C, but now the Big Three step in and try to control the show with their own schema.

  5. Thanks for an informative post, Bill. Do you have any idea whether the available “topics” in Schema.org will expand to be applicable to more industries? A lot of the sets are great for local and product-based businesses, but not necessarily to more online-based businesses.

  6. Hi shivabharathy,

    Thank you. Definitely some interesting developments with the adoption of the authorship markup and the metadata schemas. I’m not sure what impact, if any, the patents from the search engines will make, but the schema.org approach seems to favor Google the most if we look at the processes in the descriptions from the patents and how schema.org has been set up.

  7. Hi Alex,

    I’m wondering how many site owners have taken the time to look at schema.org. I suspect that some small percentage of developers have been following the standards being built around metaformats, RDF, and metadata, but it’s not a widespread mainstream topic. None of the search engines have expressly come out and said that if you use the schemas that we’ve come up with, your sites might get more traffic. How many people will take the time to start using them? I’m not sure.

  8. Hi Andrew,

    Thanks. I’m a little tempted to come out with a followup or two on the nuts and bolts of implementing some of these schemas.

  9. Hi Brandon,

    There has been some friction in the development communities around this joint effort from the search engines. I’m torn between whether this will make the Web better for everyone, or if it will mostly for those with deep pockets and/or the technical proficiency to implement the schemas. Is focusing upon metadata the right choice when there are many developers who have been working intently and intelligently for years on microformats and RDF?

  10. Hi Derek,

    There are definitely some limitations to the topics and business types offered on schema.org. The last paragraph and the ending quote of my post was an attempt to address that problem. No idea when additional industries will be included, or how much incentive many of them have to work together to try to develop those as well.

  11. Thank you. I had quite completely misunderstood rel=”me” based on how I have seen the tag widely used.

    Prominent sites that I have looked at use rel=”me” as essentially a inter-domain rel=”author”
    Do you think this existing usage will slow down Google’s giving weight to rel=”me” ?

  12. Hi Johnathan,

    The Google Blog post on Authorship Markup wasn’t all that clear either. The Webmaster Central Help page on authorship markup did a better job explaining that the rel=”me” could be used to point to other sites:

    An author page on a site can often link to other web pages about the same author, such as the author’s home page or a social networking profile. To tell Google that all these profiles represent the same person, use a rel=”me” link to establish a link between profile pages.

    The confusion over how that could be used might slow the adoption of this down a little, but I don’t think that it will slow Google down in giving weight to it.

  13. @Bill how far down the line do you think google have got to indexing the web in a way that could be “retro fitted” to a microdata schema?

    What I mean is this: looking at google squared it would seem that google has got a lot down about websites in particular verticals. Say http://www.google.com/squared/search?q=dentist+london

    You can see in many Google Squares that they SE has a good handle on what content is equal from site to site.

    Have you seen patents referring to Google Sq’d?

  14. Hi JC,

    Google’s index is constantly changing and updating. It’s not a question of “retrofitting” the whole index as much as it is adding data about changes and updates as Google discovers it, and removing older data.

    I’ve been keeping an eye out for patents or papers that specifically mention Google Squared, but haven’t run across any that mention it by name, but Google has come out with some papers that do seem on point about how the information found in Squares is collected and organized. It involves a system that creates its own schema based upon relationships between data that it finds on the Web instead of creating a dictionary of Schema and then seeing what fits.

    A couple of Google papers to look at include:

    WebTables: Exploring the Power of Tables on the Web (pdf)

    See the examples of “Presidents of the United States” and “City Populations” in that paper.

    Another relevant paper is the following:

    Uncovering the Relational Web (pdf)

    Schema.org makes it easier for web publishers to use metadata on their pages, but the schema selected is already predefined by Google. Google Squared explores relationships between data on the Web to create its own schemas for that data. Both approach the same idea from different directions, but schema.org might make it easier for Google to find more relationships between pages and sites and data because of the addition of the schemas that it includes.

    You might also like this paper from Alon Halevy, who was a co-author of those two papers, and has been involved in efforts by Google in index deep web content as well:

    Why Your Data Won’t Mix: Semantic Heterogeneity (pdf)

    Making it easier for people to use a shared schema like those found at Schema.org cna make it easier for search engines to use that information in meaningful ways.

  15. Thank you for sharing this great post. I read about Schema.org from Distilled blog and both the articles have helped me to understand.As I come from a non-technical background, your mention of other blog posts has helped to a great extent.Cheers

  16. Thanks for this Bill. A commonly supported (by major search engines) markup is certainly welcome. I’m a fan of Microformats but hopefully schema.org ends the question of what markup to use; Microdata is the commonly supported markup therefore it makes sense to use Microdata henceforth.

  17. Oh my I didn’t read about shema.org before and it looks like I missed a lot! Will have to do some more research now for sure. It looks like it’s more useful than I can imagine right now when checking results via google webmaster tools.

  18. thank you for all that is good advice for SEO of our sites on the Internet, true google often changes the rules, but do not forget the B Ba SEO for a site, I am blown away by the minutes of the blog, forgiveness for my English a little off but I am a french keyboard in this position seo
    thank you
    friendly

    max Lenient

  19. Well written post and great research. Interesting read. From what I’ve seen so far, having schema microdata in your markup does give you a slight edge. You definitely get indexed quicker and all of my author pages shoot up in the top 5 results for the targeted long tail keywords with very few shares or back links.

  20. Hi Steve,

    Thank you. I agree that using schema microdata enables you to have a little more control over how Google might index your content. It also makes it easier for Google to capture data, and capture in the way that you hope they will present it. Thanks for sharing your experience with it.

Comments are closed.