New Yahoo! Patent on Search and Similarity

Maybe there’s a little irony to the date that United States Patent 6,990,628, Method and apparatus for measuring similarity among electronic documents, was granted – January 24, 2006.

After all, that’s the day when many were saying that Yahoo! was giving up on matching or beating Google in the field of search, based upon some comments from the company’s Chief Financial Officer. Does this new patent assigned to Yahoo! hold hope for them to keep up, and maybe even surpass the present king of the search mountain, Google?

We may only find that out in the future, but it is an interesting document, and it covers a lot of ground. It’s worth poking through, and getting a sense of what it covers. A little more about the patent itself, first. The named inventors are Michael E. Palmer, Gordon Sun, and Hongyuan Zha. While it was granted on January 24, 2006, it was originally filed on June 14, 1999.

That file date may be a little misleading. From the patents and other documents referenced in the patent application (which I’ve provided links to at the end of this post), it appears that the document evolved over time from when it was originally filed until when it was granted this week.

Continue reading


Design by Wilson Pickett

You probably won’t catch me on American Idol anytime soon. I know the limitations of my voice, and I could envision the scowl on Simon Cowl’s face if I tried. No need to go there.

I pretty much love most types of music, with the possible exception of gregorian chants. There are a handful of songs that I hold special though. These are the ones that even sound good to me, within the confines of my shower. In the Midnight Hour, as performed by Mr. Wilson Pickett, is one of those.

And it’s not just the singing of the song that gets to me, but the way the whole thing comes together, the hook that instantly appeals, and yet the improvisation that makes it unique. As as Wilson Pickett would have it, “You harmonize; then you customize.”

Love this post in Design Observer – Wilson Pickett, Design Theorist, 1942 – 2006. Great points, and a sweet tribute.

Continue reading


The struggles of Google in Korea

South Korea is one of the most internet advanced and connected countries in the world. Google only has 1.6 percent of the search traffic. Why the lack of success in an endeavor where they’ve seen so much acceptance in many other places in the world?

Do a search for “Rain” in Google, and chances are good that you will get weather information. That’s true in the United States, and it’s also true in Korea. That’s part of the problem. There’s a famous singer in Korea whose name translates into “Rain.” Google’s results fail to turn up any information about this celebrity. Yet the information isn’t difficult to find on Korean sites.

The Korea Times uses that example in “Why Is Google Struggling in Korea?” (no longer available).

Google has been offering search in South Korea since 2001. But they haven’t been incorporating User Created Content (UCC) the way that local Korean search engines have.

Continue reading


Center for Web Research, Centre d’Innovació Barcelona Media, and Yahoo! Research Join Forces

Yahoo! is venturing into new territories, with the help of one of most well known search scientists in the world. Dr. Ricardo Baeza-Yates will be working with Yahoo! in new offices in Santiago and in Barcelona.

Yahoo! Research is moving into Santiago, Chile, after signing an agreement on January 9th with Universidad de Chile to open a new research center.

The research center will be run in collaboration with the Center for Web Research (CWR) , which focuses upon different aspects of research on the web. Present from Yahoo! at the announcement of the agreement were Dr. Usama Fayyad, Data chief and Sr. Vice President at Yahoo!, Dr. Prabhakar Raghavan, Director of Yahoo! Research and Roberto Alonso, Vice President and Management Director of Yahoo! for Latin America.

Dr. Ricardo Baeza-Yates has been hired by Yahoo! to be in charge of the research center in Santiago, as well as another in Barcelona, Spain. He has been the director of the CWR, and is very well known in the search community as the co-author of Modern Information Retrieval and many academic texts on data mining, information retrieval, and search.

Continue reading


Searching by phone: mobile camera phones and internet search

With Google looking at advertisements on phones, I was wondering how much more mobile phones might be capable of doing. There are some interesting answers to that question. The one I’m writing about today involves a different way of providing information to search engines, and other uses for phones.

The next really big step in providing search results to people?

Could it be in making the search engine transparent to the process?

Imagine pointing your phone at something, and taking a picture of it, and getting search results, or being led directly to a web page that was relevant to the picture.

Continue reading


Yahoo Acquisitions – The Middle Years

This is the third post in a series about the companies that Yahoo! has purchased.

I started with a look at the most recent with Yahoo! Acquisitions since Overture. Sometime after I made that post, we discovered that Yahoo! had also acquired a company named Webjay during 2005.

My second post looked at Early Yahoo! Acquisitions (the 1990s). While looking for those, I was amazed by the very large number of companies that Yahoo! partnered with for one reason or another.

This post includes some of Yahoo!’s acquisitions which probably have had the biggest impact on the search results and the advertisements that Yahoo! serves.

Continue reading


Google’s most popular and least popular top level domains

What are the most popularly used top level domains, or at least, which are the ones that show up on pages indexed in Google?

I wondered this yesterday after seeing a news article stating that the registration of .cn (china) top level domain names topped 1 million for the first time ever by the end of 2005. The seed for my wonderment was probably planted when EGOL, at Cre8asite Forums, asked about using a .info top level domain earlier that day.

So I decided to check to see which were the most popular in Google, since that was the easiest place to get some statistics.

I found a couple of lists of top level domains (generic tlds and country code tlds), and searched for the number of results that appeared in Google, using the advanced “site” search operator and my tld lists. For example, a search for “” without the quotation marks might show me approximately how many pages appear in Google’s index that are on sites using a “.com” top level domain.

Continue reading


Around the web

Some sites and stories I’ve seen recently that I wanted to share.

I’m a big fan of RSS feeds, and think that they give many sites a chance to have a much larger readership than they would otherwise. How widespread has the use of syndication through RSS feeds grown? Ravenews takes a look at the use of RSS last year in RSS Year in Review. (via Dana VanDen Heuval)

Over the last few years, it’s become increasingly clear that Dr. Jakob Nielsen knows at least as much about marketing himself as he does about usability, if not more. I’m not sure that there are too many other people online who can attract as much attention with an article as he can, and he’s done a good job of doing so with his latest, Search Engines as Leeches on the Web.

I’m finding it difficult to agree with some of his opinions, and this is an opinion piece without any usability of scientific backing behind it, but I do agree that it isn’t a good idea to rely solely on search engines, and their paid and organic listings. Danny Sullivan has a very nice response to a number of the issues that Dr. Nielsen raises at: Search Engines As Leeches, The Difference Between Paid & Free Listings & Keyword Price Rises.

Continue reading


Getting Information about Search and SEO Directly from the Search Engines