Looking at Google in China

There’s been a lot of discussion on the web, and in the news over the past few weeks about Google’s operations in China.

The Chinese version of their site, Google.cn filters out content that the Chinese government doesn’t want included in search results. As noted in the Stanford Daily (link no longer available), the Chinese language version of Google.com is unfiltered.

An issue recently arose regarding whether or not Google had a business license to even operate Google.cn, though that problem seems to have now been resolved, with a license granted after Google made a deal with Ganji.com to use their license.

One of the more interesting sets of commentary on Google in China are the posts of economics Professor Gary Becker and Federal District Judge Richard Posner, from The Becker-Posner Blog.

Continue reading

Share

Looking at Google Definitions

Over at Threadwatch, Graywolf started a thread titled Are you Optimizing for Google Definitions? There are some insightful comments in the thread, and I recalled a Google patent application that covered the topic.

I looked around the web to see if there had been any discussion about the patent application, but couldn’t find any. The document is System and method for providing definitions, (US Patent Application 20040236739) invented by Craig Nevill-Manning, filed on June 27, 2003 and published on November 25, 2004.

The abstract for the application is pretty general, but the document is fairly detailed. Here’s the abstract:

A system and method for providing definitions is described. A phrase to be defined is received. One or more documents, which each contain at least one definition, are determined. The phrase is matched to at least one of the definitions. One or more definitions for the phrase are presented.

Continue reading

Share

Google looks at multi-stage query processing

Determining how a term or phrase may be used in the context of a page can be helpful in deciding how relevant that page is in responding to a query from a searcher.

A patent application from Google was published this week which looks at possible ways of considering the context of those words, and describes a multiple stage process to determine relevancy and find results to a search.

Multi-Stage Query Processing Description Flowchart

larger image (new window)

The document is fairly complex, but some possible actions that can be taken during the different stages described are:

Continue reading

Share

More Analytics from Google

I’ve been patiently waiting for the chance to try out Measure Map. It’s an interesting looking analytics tool that hasn’t gotten much past the testing stage. I’ve seen a few blog posts over the past couple of months from people who have been using it, and enjoying it.

A new blog post (from Jeffrey Veen, of the Google Measure Map Team) explains why I might never get that invite, with measure map now part of Google. Congratulations to the folks at Adaptive Path, on their success in what was their “first initiative to develop products in-house.” It appears that Jeffrey Veen, and some of the other folks who worked on Measure Map will be leaving Adaptive Path to join Google.

Share

Estimations of the sizes of Google, Yahoo!, and MSN

How much information is included in the databases of the different search engines? How do these numbers strike you?

Google 53 Billion Pages
Yahoo! 8.4 Billion Pages
MSN 3.7 Billion Pages

Those are estimates from four researchers at the Stanford University Dept. of Computer Science, who have come up with a method of Estimating the Index Sizes of Search Engines (the article has been removed from the Stanford pages – see comments below for more details) based only upon information that could be gathered from the public interfaces of the search engines.

There are a number of questions about the results that they received, but they consider and include some discussion about potential errors that may throw off their numbers.

Share

Move over pagerank: Google’s looking at phrases?

Google isn’t the biggest search engine that Anna Lynn Patterson has worked upon. That distinction probably falls to the Internet Archives, which she worked on before joining Google, and likely has a few billion more pages in its database than Google (the archive has 55 billion web pages right now).

In addition to that feat, Anna is the writer of a pretty good article on search engines, over at ACM Queue, titled Why Writing Your Own Search Engine is Hard.

The latest search engine description from Anna Patterson, published yesterday, involves a search engine immune from Google Bombing. It could be said to reward authors for well written html, and good punctuation. It can find relevant pages that don’t include the query terms on those pages, even though immune to Google Bombing. She also finds a way to perform personalization with the search engine, and detect and eliminate duplicates.

The search engine that she has conceived of can also be set to serve a mix of relevant pages from different topics in search results to searchers. For example, a search for “Blues” could easily be set to display pages on the first page of search results that lead to:

Continue reading

Share

Gary Price Moves to Ask Jeeves

Search Engine Watch editor Gary Price will be joining Ask Jeeves as Director of Online Information Resources. He will be leaving his editorial position at Search Engine Watch, to lead an outreach program at Ask, where he will work with the library and education communities, and provide advice on new search products for the company.

It’s a terrific move for Ask Jeeves, and I wish Gary much joy in his new role. His participation at Search Engine Watch will be missed. Gary has more on the change over at ResourceShelf.

Share

Usability and Search Engine Pages, Paranoia and Personalized Search

We often focus on how search engines respond to queries here, but don’t often look too closely at the pages of the search engines themselves.

How important a role does usability play in determining which search engine a person will use?

One important aspect of search is how quickly results are retrieved. That amount of time seems insignificant these days, but I remember a time not too long ago when you would have to watch your screen for a number of seconds before a list of results appeared in front of you.

Is it important to still see something like this after getting some results from Google:

Results 1 – 100 of about 38,000,000 for search usability. (0.36 seconds)

Continue reading

Share

Getting Information about Search and SEO Directly from the Search Engines