If Google had launched in the early 90s, it might have come out with technology that could be used to search some of the electronic databases of the day, prior to the World Wide Web, such as Lexis or Dialog. It would have developed ways to visualize results from those systems in useful ways, and custom user interfaces. It might have developed a progress bar that would show you that your search was taking place, and the system hadn’t failed, back when searches took more than milliseconds.
If Google got its start before a WWW had a place in front of its name in a browser address bar, it might have developed very similar technology to what it’s working on today, but with a slightly different approach that can be sensed when reading through a number of Web-based patents from a company like Xerox.
Google was assigned 94 granted (90) and pending (4) patents from Xerox as indicated by an assignment recorded by the United States Patent Office last week, on February 16th, 2012. The execution date of the assignment is November 10, 2011. The USPTO assignment database doesn’t include any information regarding the details of the transaction, such as financial terms.
My last post linking Google and Xerox together was titled Xerox Brings Patent Infringement Suit Against Google, Yahoo, and YouTube. A look at the PACER records for the case (1:10-cv-00136-UNA) in US District Court for the District of Delaware shows it being closed on December 15th, 2011. The case docket includes a stipulation between Google and Xerox dismissing Google from the case on 11/11/11, the day after the assignment of these patents was executed. It appears that the assignments of the patents might have been related in some way to the stipulation, though the patents Xerox claimed were being infringed upon by Google and YouTube weren’t included in the assignment.
While the patent filings include a number outside of search and information retrieval, such as a few involving handheld devices, printing over a network, distributed networking systems, optical character recognition, and workflow processes, many of the patents do seem related to search based services that Google provides.
A number of the patents involved focus upon reviews and collaborative filtering of those reviews, caching of webpages in part and in whole, managing online documents, and what seems to be a large family of patents by the same or similar names that focus upon comparing and determining the quality of documents. Reading through a number of those, I was reminded that today is the one year anniversary of Google’s announcement of their Panda Algorithm.
The patents that focus upon document quality could potentially influence some aspects of the quality scoring of web pages that might be classified based upon an algorithmic machine learning approach such as Panda. Here’s the abstract from one of those patents:
Text, images, and/or graphics of electronic documents should be organized and laid out in a two-dimensional format for presentation to the viewer. The best such layout depends upon the content present, the creator’s intent, the output device, and the viewer’s interests. To analyze the qualitative nature of the layout in quantifiable terms, the electronic document is measure using various quantifiable factors; such as, balance, uniformity, white space management, alignment, consistency, legibility, etc.; that impact a qualitative nature of a document. Such quantifiable factors are then used to quantize the aesthetics, ease of use, eye-catching ability, interest, communicability, comfort, and convenience of the document.
I haven’t had the chance to read through all of these, and pick them apart, and will probably be doing that as time permits, but thought that might be easier with more eyeballs on the patent filings. Here are the granted and pending patents that were included in the USPTO assignment:
Granted Patents
- Conversion Of Queries To Monotonically Increasing Increments Form To Continuously Query A Append Only Database (US Patent 5495600)
- Method And Apparatus For Concurrent Graphical Visualization Of A Database Search And Its Search History (US Patent 5515488)
- Method And Apparatus For Visualization Of Database Search Results (US Patent 5546529)
- Feature Library And Stored Customized Control Interfaces (US Patent 5604860)
- Document Job Key To Tailor Multifunctional User Interfaces (US Patent 5630079)
- Method And Apparatus For Time Estimation And Progress Feedback On Distal Access Operations (US Patent 5657450)
- Hierarchy Of Saving And Retrieving Control Templates (US Patent 5717439)
- Automated System For Indexing Graphical Documents Having Associated Text Labels (US Patent 5845288)
- Shared-Data Environment In Which Each File Has Independent Security Properties (US Patent 5930801)
- Method Of Indexing Words In Handwritten Document Images Using Image Hash Tables (US Patent 5953451)
- Integration Platform For Heterogeneous Databases (US Patent 5970490)
- Centralized Print Server For Interfacing One Or More Network Clients With A Plurality Printing Devices (US Patent 5974234)
- Centralized Print Service For Interfacing one Or More Network Clients With A Plurality Of Printing Devices (US Patent 6020973)
- System For Cloning Document Processing Related Settings In A Document Processing System (US Patent 6026436)
- Apparatus And Method For Loading And Reloading Html Pages Having Cacheable And Non-Cacheable Portions (US Patent 6061715)
- Apparatus And Method For Loading And Reloading Html Pages Having Cacheable And Non-Cacheable Portions (US Patent 6094662)
- Automatic Language Identification Using Both N-Gram And Word Information (US Patent 6167369)
- User Level Accessing Of Low-Level Computer System Operations (US Patent 6266670)
- Property Based Mechanism For Flexibility Supporting Front-End And Back-End Components Having Different Communication Protocols (US Patent 6269380)
- A User Level Controlled Mechanisminter-Positioned In A Read/Write Path Of A Property- Based Document Management System (US Patent 6308179)
- System And Method For Using Noisy Collaborative Filtering To Rank And Present Items (US Patent 6321179)
- System And Method For Using Noisy Collaborative Filtering To Rank And Present Items (US Patent 6321232)
- System And Method For Collaborative Ranking Of Search Results Employing User And Group Profiles Derived From Document Collection Content Analysis (US Patent 6327590)
- Methods For Visualizing Transformations Among Related Series Of Graphs (US Patent 6369819)
- Direct Manipulation Interface For Document Properties (US Patent 6370538)
- System And Method For Bootstrapping A Collaborative Filtering System (US Patent 6389372)
- Secure Token-Based Document Server (US Patent 6397261)
- Augmented-Reality Display Method And System (US Patent 6408257)
- System And Method For Caching (US Patent 6415368)
- Remote Feature Delivery For Output Devices (US Patent 6424950)
- Mobile Document Paging Service (US Patent 6430601)
- Method For Providing Time Discrimination In The World Wide Web (US Patent 6470269)
- Mobile E-Mail Document Transaction Service (US Patent 6487189)
- System And Method For Searching And Recommending Documents In A Collection Using Share Bookmarks (US Patent 6493702)
- Document Management System For Recording And Viewing The History Of Document Use (US Patent 6493731)
- Bristlelines: A Visualization For Discovery Relationships Between Sorted Web Documents And Their Usage Data (US Patent 6499034)
- Usage Based Methods Of Traversing And Displaying Generalized Graph Structures (US Patent 6509898)
- System And Method For Visually Representing The Contents Of A Multiple Data Object Cluster (US Patent 6564202)
- System And Method For Providing Recommendations Based On Multi-Modal User Clusters (US Patent 6567797)
- System And Method For Clustering Data Objects In A Collection (US Patent 6598054)
- System And Method For Analyzing Eyetracker Data (US Patent 6601021)
- Secure Token-Based Document Server (US Patent 6601102)
- System Of Indexing A Two Dimensional Pattern In A Document Drawing (US Patent 6621941)
- System And Method For Caching (US Patent 6631451)
- System And Method For Caching Of Reusable Objects (US Patent 6662270)
- System And Method For Predicting Web User Flow By Determining Association Strength Of Hypermedia Links (US Patent 6671711)
- Decentralized Network System (US Patent 6671737)
- Distributed Document-Based Calendaring System (US Patent 6675356)
- Systems And Methods Providing Flexible Representations Of Work (US Patent 6725428)
- System And Method For Information Browsing Using Multi-Modal Features (US Patent 6728752)
- Method And Apparatus For Formatting OCR Text (US Patent 6741745)
- Method For Monitoring And Encouraging Community Activity In A Networked Environment (US Patent 6742032)
- Knowledge Management System And Method (US Patent 6873430)
- Systems And Methods For Predicting Usage Of A Web Site Using Proximal Cues (US Patent 6907459)
- System And Method For Quantitatively Representing Data Objects In Vector Space (US Patent 6922699)
- System And Method For Identifying Similarities Among Objects In A Collection (US Patent 6941321)
- Electronic Board System (US Patent 6964022)
- System, Method And Article Of Manufacture For Cryptoserver-Based Auction (US Patent 6990468)
- System And Method For Constraint-Based Document Generation (US Patent 7010746)
- System And Method For Inferring User Information Need In Hypermedia Linked Document Collection (US Patent 7017110)
- System And Method For Measuring And Quantizing Document Quality (US Patent 7024022)
- System And Method For Measuring And Quantizing Document Quality (US Patent 7035438)
- System And Method For Measuring And Quantizing Document Quality (US Patent 7035439)
- System And Method For Providing A Site Specific Location Of A Device (US Patent 7054651)
- System And Method For Measuring And Quantizing Document Quality (US Patent 7072495)
- System And Method For Measuring And Quantizing Document Quality (US Patent 7092551)
- System And Method For Measuring And Quantizing Document Quality (US Patent 7092552)
- System And Method For Measuring And Quantizing Document Quality (US Patent 7095877)
- Method For Constraint-Based Document Generation (US Patent 7107525)
- Transparent Injection Of Specific Content Into Web Pages Viewed While Browsing (US Patent 7107526)
- System And Method For Measuring And Quantizing Document Quality (US Patent 7116802)
- System And Method For Measuring And Quantizing Document Quality (US Patent 7130450)
- System And Method For Measuring And Quantizing Document Quality (US Patent 7130451)
- System And Method For Measuring And Quantizing Document Quality (US Patent 7136511)
- Multi-Versioned Documents And Method For Creation And Use Thereof (US Patent 7171618)
- Viewing Tabular Data On Small Handheld Displays And Mobile Phones (US Patent 7200615)
- Method And System For Expertise Mapping Based On User Activity In Recommender Systems (US Patent 7240055)
- Method For Measuring And Quantizing Document Quality (US Patent 7260245)
- Method For Measuring And Quantizing Document Quality (US Patent 7266222)
- Method For Measuring And Quantizing Document Quality (US Patent 7269276)
- Method For Measuring And Quantizing Document Quality (US Patent 7277560)
- Method For Measuring And Quantizing Document Quality (US Patent 7280675)
- Method For Measuring And Quantizing Document Quality (US Patent 7283648)
- Flexible Rule-Based Recommender System For Use With An Electronic Board System (US Patent 7287024)
- Method For Measuring And Quantizing Document Quality (US Patent 7305107)
- Method For Measuring And Quantizing Document Quality (US Patent 7308116)
- Recommender System And Method (US Patent 7386547)
- Method For Determining Overall Effectiveness Of A Document (US Patent 7391885)
- Recommender System And Method (US Patent 7440943)
- Computerized Action Tool For Managing Print Parameters, Queuing Actions And Archiving Actions For A Document Output Management System (US Patent 7698650)
Pending Patent Applications
- Remote feature delivery for output devices (US Patent Application 20020161649)
- Multi-Versioned Documents and Method for Creation and Use Thereof (US Patent Application 20070061384)
- Remote Feature Delivery for Output Devices (US Patent Application 20070136137)
- Computerized Action Tool for Managing Print Parameters, Queuing Actions and Archiving Actions for a Document Output Management System (US Patent Application 20100175009)
Takeaways
Google has been acquiring a large number of pending and granted patents from other companies in the past couple of years. A number of those covered a very wide range of technologies, from sensor technology for driverless cars, to fiber optics networking processes and devices, to computer and database architecture, and more.
This acquisition seems a little more focused upon some of the core search technologies that Google is best known for, from some fairly old patents still focused upon search, to some newer patents that might help Google with its move towards improving its processes for reviews and recommendations and determining quality scores for documents on the Web. For anyone interested in how Google is evolving towards machine learning processes to rank web pages, there can be some value in spending some time going through these patents.
Absolutely there is no doubt Google is evolving towards machine learning processes to rank web pages & it’s incredibly awesome.
Special thanks for patent about
“System and method for predicting web user flow by determining association strength of hypermedia links”
And after all we all can’t regret that Google is Blood of internet.
Hi Rajesh,
A machine learning system is often only as good as the data set that it uses to start out with. What I liked about a number of the patents involved in this transaction, like the document quality ones, is that they set out some baselines for defining quality that wouldn’t be so dependent upon different seed sets of “quality” pages. Without those, I think you run a greater risk of lowering the rankings of pages that don’t fall close enough to the mold of the sites you included in your seed set, yet which might still provide quality content, and a quality user experience.
A lot of the patents listed are related to document management systems including pages, images, vector spaces, etc, which seem to be a domain of Xerox. Hence, these patents were invented by a company which has a huge impact on digital production like printers or photo copiers. Xerox, next to Adobe, is a one of the most influencing software developer.
What I am trying to say is that Google needs to use their trade partners, like Xerox, to improve their services.
Hi Martin,
Many of these patents are indeed a way of looking at documents in a manner that is very different from how Google might when trying to analyze them for search. I think the very different approach adds a level of sophistication that Google hasn’t had the chance to develop. It’s not really certain how Google might use them, but with approaches like Google’s Panda update, it seems the search engine is focused upon understanding how the layouts of pages might influence how people view and use them.
Hi Bill,
I wish Google to perform well. I hear from everywhere they try to help humans in designing a new, better digital world. From everywhere you can hear many voices saying Google tries to fight with SEO spammers. But not sure if the changes they have been doing will not bring them more mess. They try to create searches more targeted and personalized which appears against them.
Hi Martin,
Every algorithm change and every new ranking approach usually has a possible way that it might be manipulated and abused by people looking to do so. The way to try to combat that is to make it cost more in terms of time, expense, and effort to do so to the point where it becomes more expensive to manipulate than it does to not spam.