A new patent from Yahoo! was granted this morning. Note that the classifications being talked about aren’t based upon relevance, but rather upon categories that documents being retrieved by the search engine might belong to, such as ones based upon language or region.
Associating documents with classifications and ranking documents based on classification weights
Inventors: Hongyuan Zha and Sean Suchter
Assigned to Yahoo! Inc.
US Patent 7,028,027
Granted April 11, 2006
Filed: September 30, 2002
A method and apparatus for associating documents with classification values and ranking documents based on classification weights is provided.
Google was granted a patent today on a voice interface for searches.
There are a number of potential issues with understanding speech when trying to perform searches by voice, which are described by the patent filing.
With most speech recognition technology, there can be high error rates when the vocabulary used is large, and the amount of dialogue used is small. Those applications often need to be trained to recognize unique vocal inflections from speakers. A search query is often limited to a handful of words or less.
Voice interfaces for search engines currently limit that problem by keeping the scope of communication small, asking the searcher to choose from a limited number of categories, and drilling down to smaller categories while providing limited sets of choices.
This can cause voice search to be slow, and limited to pre-chosen categories from the search engine.
Ever use MSN’s sliders? Do you know about MSN’s sliders? If they changed, would you notice?
I’m not sure that I would.
A couple of days ago, I wrote about feature based rankings at MSN. Rather than describing how understanding feature based ranking (or fRank) might help when optimizing a site for an MSN search, I focused upon some of the different categories that those features might fall into.
While those might be used in the future, or may even be used to help rank pages in a normal search on MSN now, there’s possibly another use for them, too.
In the future, we might see some of those features and categories appear in a different context, where the person searching has some control over which of those are most important to him or her. Another new patent application from Microsoft describes how. We can see some of that context in a somewhat obscure part of MSN’s search.
I compiled another quick list of links this week, but you might want to read them fast, because as Kid Mercury noted almost a week ago, On April 11, The Internet Gets Destroyed (No longer available)
OK, it might not be demolished, but it may just be broken in a number of places after Microsoft issues a new patch that treats some HTML tags involving embedded objects differently, instead of paying licensing fees for the technology.
Google Buys Search Algorithm Invented by Israeli Student
Ori Alon (or Allon), an Israeli student, who has been studying in at the University of New South Wales in Australia, appears to now be working in Google’s Mountain View offices. After a press release in September of last year, it appears that Microsoft, Yahoo!, and Google were all seeking this software, which finds links to related resources, based upon text found on a page from a query on a specific subject.
How does one optimize pages for MSN, given that they use a machine-based ranking system to rank results and return results to visitors?
Some new research from Microsoft, and some recently released patent applications might provide some ideas.
Before I dive into this, I want to point out Search Engines and Algorithms: Optimizing for MSN’s RankNet Technology by Jennifer Sullivan Cassidy, which takes a look at Microsoft’s Ranknet Technology. It’s a good introduction into some of the research that Microsoft has been doing lately.
Query independent ranking
Ranknet is discussed more in a paper to be presented in May at the WWW2006, titled Beyond PageRank: Machine Learning for Static Ranking. It provides a detailed look at how human ranked pages can be used to identify other high quality pages, without relying upon the link structure of the web.
New patent applications were published today at the US Patent and Trademark Office with the names of Google employees on them. Three look at how documents might be presented in an application like Google Book Search, and the other is an addition to patent filings that describe Google’s email system.
Searching scanned documents
The initial two are related to another patent application that was published last week, User interfaces for a document search engine, which involves searching scanned documents placed online.
This first application covers much of the same ground as last week’s published document, but not in as much detail. There are some details in this version that aren’t in the other one, but it feels as though this one is the first draft. They were filed on the same day.
User interface for presentation of a document
Inventor: Joe Sriver
US Patent Application 20060075327
Published April 6, 2006
Filed: September 29, 2004
Buying a house is one of the biggest decisions that someone can make these days. It’s a life-transforming step, regardless of whether the new home is a few miles away, or across the country. And it’s one of the largest purchases many people can make.
There are some new looks to sites that focus on real estate lately. And a lot of information that was only available to real estate agents is being shared with people looking for homes.
If you haven’t seen zillow.com, which allows you to look at maps of locations, and find houses that are for sale in those regions, you’ve missed out on a fun and interesting new mashup of mapping and data integration. Within the last day or so, news of Google showing real estate listings has also come out, though those are shown through the Google Base service from the company, rather than as a separate and new listing service.
TechCrunch noted a week ago that Zillow has some competition in the mapping and display of homes for sale, in the shape of RealEstateABC. It’s kind of fun to look around these sites, and see what might be for sale around you. I wonder how helpful these tools are to people looking for homes.
A new patent application from Microsoft describes some ways to identify some of the spam pages that show up in search engine results. The research that led to the application started off by looking at something else completely, but a chance discovery turned up some interesting results.
The initial research began with something Microsoft calls Pageturner. Pageturner is a project that looks at how often web pages update, and how frequently they might need to be crawled. It also looks at identifying duplicate and near duplicate content on web pages.
The Microsoft researchers on that project found themselves being drawn to some very different research after looking at some of their results, especially from some pages located in Germany, which changed too quickly. Here are a couple of papers that describe some of the results of the original research:
On the Evolution of Clusters of Near Duplicate Web Pages (pdf)