Google Autolink Patent

A patent application filed at the end of last week appeared to describe how Google Autolink worked – Providing useful information associated with an item in a document.

The United States Patent and Trademark Office assignment database shows that this document was assigned to Google in December of 2004, but, as close as it seemed to describe how autolink worked, I wasn’t completely convinced.

At least until I looked closer at the “figures” filed with the document. Note the “autolink” button on the bottom toolbar in the picture of a browser window below.

A browser window, with a Google Autolink button from the patent application.

There were a lot of great articles and blog posts written on Google’s Autolink when it was first described, and released onto the Google Toolbar, from Jason Kottke through Danny Sullivan. It’s hard to believe that more than a year has passed.

How does it work?

Pattern matching.

The toolbar might remove formatting from the page and analyze its contents to try to recognize information.

The items of information it may look for could include:

  • Postal addresses,
  • Telephone numbers,
  • Flight information,
  • Traffic information,
  • Product identification information,
  • Tracking numbers,
  • Document identification numbers (e.g., International Standard Book Number (ISBN)),
  • International Standard Serial Number (ISSN),
  • Digital Object Identifier (DOI)),
  • Vehicle identification numbers (VINs), and;
  • Others,

These items are the kind that might be identified based on pattern matching. They differ in content, but there’s a match in the general pattern of the characters they contain for many of them. The “Digital Object Identifier” is an interesting thing to include in this list from the patent application, and it’s probably worth a visit to the page I linked to it above, to see what types of things can have a Digital Object Identifier (quite a few, actually).

Here’s a snippet from the document explaining how the easier patterns from some items can be identified and matched:

A postal address, for example, may contain information commonly associated with an address, such as a number (street or zip code), a street name, a street type (road, street, lane, etc.), a city name, and a state name in relative close proximity to one another. Similarly, tracking numbers for a particular company may contain the same format. For example, the United Parcel Service (UPS) uses the following three formats for its tracking numbers: 1Z 000 000 00 0000 000 0; 0000 0000 0000; and T000 0000 000. Therefore, these patterns of characters may be used to identify UPS tracking numbers. The other types of information identified above may contain their own patterns of characters.

In addition to looking at the patterns, the toolbar may also look for some words near the item that it is trying to find a pattern for. So, for a tracking number, it might expect to see words like these near the number:

“ship,” “shipment,” “shipping,” “track,” “tracking,” “delivery,” and “package.”

The patent application goes on to explain how the button and autolink drop down might work, and tie in with a mapping program and page, or a tracking or book or vehicle information provider. It also describes the process of inserting underlines on the page, and a hyperlink to a page that shows more information based upon the item of information found by the toolbar.

It also provides detailed explanations of how this might work with tracking numbers and mailing addresses.

Conclusion

Google has limited the use of autolink to a handful of information items. This document gives us an idea of how they could possibly expand it, if they were so inclined.

Share

9 thoughts on “Google Autolink Patent”

  1. Hi Bill,

    Quite strange, this patent has the same dates than this one, no. 20060129534, filed by Yahoo : “System and methods for ranking the relative value of terms in a multi-term search query using deletion prediction”.
    Cheers,
    Jean-Marie

  2. Hi Jean-Marie,

    You have a keen eye. I knew that the publication date was the same because they were published the same week, and patent applications all get published on the same day of the week, so that’s not a big surprise.

    I guess that it shouldn’t also surprise us that the documents took around the same amount of time to go from being a filed application, to being a published one. There’s a requirement that patent applications should be published within 18 months of being originally filed (35 U.S.C. 122 Confidential status of applications; publication of patent applications.). They may have been published originally as provisional patents, and filed a little earlier – and then updated as full patents and administrated by the patent office on the same date. Or a request to publish both a little early may have been made.

    Unlike granted patents, the name of the examiner isn’t on these applications, but they could share the same examiner since the topics are related enough for someone having expertise in these types of topics to have reviewed both.

    You have me looking for other similarities now. :)

    Here’s one:

    They also share a “US Class” of 707/3, which has two parts. The first part, Class 707, stands for Data Processing: Database and File Management or Data Structures. The 3 in the second part of it stands for Query processing (i.e., searching).

    They aren’t completely in the same class though, since the Google patent application also has a second “US Class” listed, which has a different first part – 715, and a different second part – 501.1, which is defined as Hypermedia.

    The full definition of that “Hypermedia” is interesting:

    Subject matter wherein the textual information includes embedded links or format codes that direct process flow to alternate or additional displays.

    (1) Note. Processing of a document containing embedded links which, when selected or processed, changes the display to other portions of the same document or to other documents is classified herein.

    Thanks for mentioning the shared dates.

  3. Hi Bill,

    Wooow! Impressive investigation, thanks for details. On my own, I found an Overture document which is at the origine of the Yahoo patent, actually only available with Google cache
    Jean-Marie

  4. Excellent research!

    Thanks, Jean-Marie.

    It’s nice to be able to look at some of the work that inspired the creation and publication of a patent filing, expecially when it expresses the ideas behind the document in such clear language.

    I found another copy, Query word deletion prediction, which isn’t in the cache. It was originally presented as a poster during the SIGIR in 2003, and has been referred to in a few other documents. I’m going to make a followup post on those.

    Again, thank you.

  5. Hi Clark,

    Google has tried a number of features that they’ve retired, and recently retired a number that were in the Google Labs, including Google Sets which had been around for a good number of years. I’m not exactly sure when autolink was retired, but I think it was a few years ago.

Comments are closed.