A Google patent granted last week describes how the indexes at different Google data centers may contain pages that are indexed and classified as global, and pages that are indexed and classified as regional. Last summer, I wrote about how Google may predict which data center might provide the best results for a query. Google was also granted a number of patents last August that provided some insights into how Google’s Planet Scale Distributed Storage of Data may work.
Those patents from last summer give us an intriguing but incomplete look at the pages contained in Google’s data centers. The newly granted patent appears to fill in some significant gaps. Imagine that each data center might contain some unique pages and content that’s regional in nature, and some content that might be replicated across more than one data center that’s global in nature. The global content could potentially take up between 50% and 75% of storage area on each data center.
Nuance Communications, which partners with Apple Computers to provide the voice recognition software behind Apple’s intelligent assistant Siri, had 4 patent applications published today at the USPTO that focus upon search and search technology. While the company has at least 274 granted patents and 104 pending patents listed as assigned to it at the US patent and trademark office, these appear to be the first that focus upon the operations of a search engine. They reference the Dragon Search application built for iPhones:
The topics covered in the Nuance patent portfolio primarily involve speech recognition technology, but include some areas that companies like Google have been focusing upon within a few of their patents as well, such as statistical language models and document segmentation algorithms, as well as a browser for the voice web which was filed in 1998.
When a judge writes a judicial opinion upon a case, he often includes more than just his ruling on the case. It usually contains an analysis of the present law, the legal atmosphere, and how the ultimate holding on the case was arrived at. Those written rulings can also include some legal opinions on issues that don’t necessarily play an essential role in the outcome of the case at hand, and those are often referred to as “dicta.”
When you read a patent, you’ll see that it’s broken into a number of parts. The most important of those is the claims section, which is what a patent examiner focuses upon when prosecuting a patent, and deciding whether or not it should be granted. There are also description sections in patents which give a richer and more detailed look at how the technology behind a patent might be implemented (with emphasis on the “might”). Often those descriptions include material that isn’t reflected within the claims section of a patent, and in many ways, those description sections could be considered as similar to the dicta that I mentioned sometimes appears within judicial opinions.
Stanford University was granted two new patents today under the name, Scoring documents in a database, both of which were filed at the United States Patent and Trademark Office on January 19, 2010. These two patents, assigned to Stanford and listing Lawrence Page as inventor, are described as continuation patents of the following patents assigned to Stanford which focus upon PageRank:
In the Google Inside Search blog, Google’s Amit Sighal published a post titled Search quality highlights: 40 changes for February that told us about many changes to how Google ranks pages, including the following:
Link evaluation. We often use characteristics of links to help us figure out the topic of a linked page. We have changed the way in which we evaluate links; in particular, we are turning off a method of link analysis that we used for several years. We often rearchitect or turn off parts of our scoring in order to keep our system maintainable, clean and understandable.
A lot of people were guessing which “method of link analysis” might have been changed, from PageRank being turned off, to anchor text being devalued, to Google ignoring rel=”nofollow” attributes in links, to others. I was asked my opinion by a few people, and mentioned that there were a number of potential approaches that Google might have changed.
I love local search. In many ways, it’s similar to Google’s Web Search, but with its own unique features. In addition to Googlebot, Local search has street view cars. In addition to looking at links, local search also looks for mentions of businesses that appear with location-based information. Instead of robots.txt files, local search is stopped by signs like “military base,” or “private street.”
I also appreciate the local search ranking factors that a good number of people who are involved in local search have been putting together every year lately, but I’m also a little apprehensive about those, and I’m going to illustrate why with this post. Imagine for instance that Google considers the names of businesses in the way that it ranks them in local search, but that it doesn’t treat every name the same. For example, “Frost Diner” might be treated one way by Google Local Search.
And, because it has a somewhat longer name, “Red Truck Bakery” might be treated differently by the algorithms that use business names as a ranking signal by Google’s Local Search:
If Google had launched in the early 90s, it might have come out with technology that could be used to search some of the electronic databases of the day, prior to the World Wide Web, such as Lexis or Dialog. It would have developed ways to visualize results from those systems in useful ways, and custom user interfaces. It might have developed a progress bar that would show you that your search was taking place, and the system hadn’t failed, back when searches took more than milliseconds.
If Google got its start before a WWW had a place in front of its name in a browser address bar, it might have developed very similar technology to what it’s working on today, but with a slightly different approach that can be sensed when reading through a number of Web-based patents from a company like Xerox.
Google was assigned 94 granted (90) and pending (4) patents from Xerox as indicated by an assignment recorded by the United States Patent Office last week, on February 16th, 2012. The execution date of the assignment is November 10, 2011. The USPTO assignment database doesn’t include any information regarding the details of the transaction, such as financial terms.
Google has been busy over the past couple of years acquiring a good number of small startups, including some that may help or have helped contribute features to Google Plus, such as Fridge, Tweet counting SocialGrapple, people sorting Katango, the team behind JustSpotted, social ranking PostRank, and social movie recommendation service fflick.
Google hasn’t publicly announced every acquisition that it has made, and the search engine has also purchased intellectual property such as pending and granted patents from some companies as well, without necessarily buying the companies behind the patents. For example, in August of 2010, Google was assigned a handful of patent filings from Appmail, LLC, recorded at the USPTO in May of 2011. A pending and a granted patent from that group appear to be related to Grouptivity, which was a social service run by Appmail that used a social mail service to enable people to share content they found on the web with others, either privately or publicly. That service allowed for the creation of groups to “keep your personal contacts separate from co-workers and other categories.” As a publisher-centric web service, grouptivity was described as a service that:
According to the United States Patent and Trademark Office (USPTO) assignment database, Google has acquired the pending patent applications of one time search rival Cuil, touted when launched as a potential Google Killer.
On July 28, 2008, the search engine Cuil went live with hopes from many that it would rival Google in technological know-how and create some competition for the search engine. Those hopes were fueled in part by the fact that search engine was started by former Google employees Anna Patterson and Russell Power, a co-founder from IBM, Tom Costello, and they were also joined by Altavista founder and ex-googler Louis Monier as well. The company received a fair amount of funding before it launched, likely in-part due to the past employment history of its founders.
When Cuil launched, it supposedly had within its index more that three times as many Web pages as Google, and ten times as many as Microsoft. It promised not to retain information about searchers past search histories or surfing patterns as a way of distinguishing itself from Google. Bloomberg News called it one of the most successful start ups of 2008, and there were some real high hopes that the new search engine would rival Google.
Things seemed to start going south for Cuil shortly after launch. After a month, Louis Monier left the company after disagreements with CEO Tom Costello. Search results were presented in a 2 column format rather than a single column, and were accompanied by thumbnail images. I noticed at the time, a few complaints about the two column format, and in my personal experience using the site, the thumbnails presented often weren’t very good choices, and not representative of the pages or topics being returned in search results. The Cuil website shut down in September of 2010, with news of a mysterious acquisition falling through surfacing a week later.