on Flagging Famous People

Does a search engine work better if it can figure out whether or not a search query is a name?

The folks at appear to think so, and even want to know if the name is that of someone famous. I’m not sure how they measure fame, but they have a method for flagging names of the famous, as well as names that look like names, and names that really aren’t names (Brandy Alexander, anyone?)

picture of rap star 50 cent from search result page

The process is described in a patent application from Ask, and details how they might go about figuring out whether “Usher” or “50 Cent” or “Attila the Hun” refer to people, or to something else completely.

Systems and methods for predicting if a query is a name
Invented by Eric J. Glover, Apostolos Gerasoulis and Vadim Bich
US Patent Application 20070239735
Published October 11, 2007
Filed: April 5, 2006

Continue reading “ on Flagging Famous People”

Mouse vs. EyeTracking: Mouse Wins, plus Successful Search Strategies

At SIGIR 2007, one of the workshops held at the July Conference in Amsterdam was on Web Information Seeking and Interaction.

Web information seeking and interaction involves looking at the way that searchers interact with Web-based content and applications when they are looking for something. The conference covered a wide range of research, and I want to go into a little more detail on a couple of documents that were authored or co-authored by Google Employees.

The papers and working notes from the workshop contain a nice mix of topics, which are worth taking a look at. The papers at that link that initially caught my attention was one on experiments with eye tracking and mouse movements, and another that explored strategies for Web search.

Exploring How Mouse Movements Relate to Eye Movements on Web Search Results Pages
Kerry Rodden (Google) and Xin Fu (University of North Carolina, Chapel Hill)

Continue reading “Mouse vs. EyeTracking: Mouse Wins, plus Successful Search Strategies”

Google on Determining Document Subjects

Fact extraction is growing as a method that search engines can use to identify and understand what pages on the website are about and to collect facts about document subjects and answer questions posed by people submitting queries to a search engine.

A recent paper from Google provides a nice overview of some methods being used for fact extraction. A Google patent application published last week explores looking at titles on pages, and anchor text in related pages on the same domain to determine document subjects for pages.

The paper is Corroborate and Learn Facts from the Web (pdf), and the process described within it is has been called GRAZER. Here’s a little about how it works:

It starts with facts imported from one website and takes them as known facts (seed facts). Then it tries to find mentions of the seed facts on other web sites. This involves retrieving relevant pages for each entity and then corroborates facts in them.

Continue reading “Google on Determining Document Subjects”

Second Thoughts on a GPhone: Privacy and Targeted Ads

A couple of months ago, the Wall Street Journal provided some speculation about a Google phone in an article titled Google Pushes Tailored Phones To Win Lucrative Ad Market.

It’s difficult to tell how much is speculation, but Google CEO Eric Schmidt has gone on record as saying that he believes that consumers will watch targeted ads in exchange for free cell phone service.

How exactly would targeted ads work on a cell phone? Would they check out whom you’re calling, and target ads based upon whether your calls go to the local cheap pizzeria for delivery instead of making reservations at very expensive eateries?

If you prefer to stay at 5 star hotels or at basic budget lodgings, how might the ads that you see on your phone differ? If most of the people you call have Italian last names, might pasta feature prominently in the targeted ads that you see?

Continue reading “Second Thoughts on a GPhone: Privacy and Targeted Ads”

Google’s Profiling Both Users and Sites?

A profile-based approach to individual site searches, and to group personalization (to protect individual privacy), is explored in two related patent applications from Google, one from this week, and one from last week.

The first patent application involves Google’s free site search that a site owner can add to their site, and may enable visitors to find pages on that site containing information that they might be seeking there.

Their free site search doesn’t necessarily use the same algorithms that Google uses to index the Web, but instead may follow a different approach to ranking information for sites that it is used upon by building profiles for those sites based upon searcher’s behavior at those sites.

The second patent application tries to protect individual privacy by creating “group” profiles for searchers, using a very similar profiling method.

Continue reading “Google’s Profiling Both Users and Sites?”

Google Learning Speech Recognition for Voice Search from MTV?

How might a voice search engine learn new words that have been introduced into popular speech, such as “da shiznet,” and learn and understand different pronunciations of words, such as might be found in spoken language based upon regional differences?

A newly published patent filing from Google provides some hints.

Last April, Google was granted a patent on a Voice interface for a search engine. I wrote about it in Google voice search patent granted.

That earlier patent filing introduces a number of topics around speech recognition, and tells us about things like a language model, which could learn new words and different pronuciations.

Since then, we’ve actually seen a voice search from Google introduced at Goog 411

Continue reading “Google Learning Speech Recognition for Voice Search from MTV?”