One of the challenges of the Google Book Search project has been to find a way to index all of the books included within the project.
We don’t know the details of the technology used to index those books. A little research uncovers some interesting information.
A post at Search Science this November involved the award of a grant of $107,112 by Google to Rada Mihalcea. Xan Porter noted there that Professor Mihalcea’s research involving “automatic extraction methods to retrieve significant information in books stored in electronic format” is what likely interested Google in getting her help for Google Print, or Google Book Search, as it is known now.
As a co-inventor of textrank, she seems to have been the ideal candidate for Bringing Order into Texts.
It’s impossible to tell whether or not textrank is what is being used to index those books.
When I go into my local Starbucks, I’ll usually ask for a “large” chai, or coffee. The word “venti,” which they use to indicate their largest size, just can’t make its way out of my mouth.
Something about fake foreign languages maybe. So, it means 20, as in twenty ounces. There’s a pretentiousness to the sizing that I just can’t bring myself to buy into.
There is also a Dunkin’ Donuts in the small college town I live in, and I have no problem asking for a large coffee there. In Dunkin’ Donuts, a large is a large. There are also at least eight or nine other coffee places in town (it’s a highly caffeinated place). Most of them are better experiences than either Starbucks or Dunkin’ Donuts. A few are cheaper than Dunkin’ Donuts.
With news of the sale of Dunkin’ Donuts, I decided to take a look at their web site. There are some strange things going on there.
The first that struck me immediately is that the site uses a secure protocol (https) throughout. I’m not sure why. There are no forms to fill out on the front page of the site. Do the folks who work on their web pages understand why a secure protocol exists? Maybe they have a reason for what they do. They are getting pages indexed in the search engines, but it does look like they could use someone with some SEO knowledge taking a serious look at their pages.
There are many different approaches you can take when developing the content for a web site. An interesting article from User Interface Engineering describes a method inspired by the Inuit.
The Inuit create works of art that often resemble people, out of stones found near where they are making these statues. These markers can tell later viewers something about the place, or the builder’s experiences there.
Is that something that can help give us ideas which inspire us when we design? It’s inspiring me.
Reassuring Users with Inukshuk Content describes a university site that decided to show what a student’s experience might be like at their school by using more than 40 detailed profiles of people who may have been students, or are associated with the school in some manner.
Google’s Personalized Search is a Bookmark manager, and it personalizes searches based upon the bookmarks you collect, and more. In addition to using your bookmarks, it may also be looking at your browser’s history file to inform searches.
The personalized search beta appears to be something I wrote about back in June:
If a Googler Offers you a Bookmark Manager, Punch Him
That thread has an analysis of the patent application, Methods and systems for personalized network searching, that is likely behind it there, including such things as:
More than URLs Measured
If you could have your dream email program, what would it be like? If you could have Google back up your efforts in building that program, you might just be Paul Buchheit.
A new patent application from Google’s GMail engineer Paul Buchheit was published today on the snippets that you might see when you receive search results in a search engine, including results that might be received when searching for such things as an email.
Variable length snippet generation
Here’s the abstract:
Announced at threadwatch (Alexa suddenly relevant?) by Aaron, and a number of other places, it might be time to take another look at Alexa, with the announcement that they are opening up their data to folks at affordable prices. At least, they appear to be affordable – kind of hard to tell at this point.
Details are here: Alexa Web Search Platform (the original link on the Alexa domain is no longer around). A snippet:
The Alexa Web Search Platform provides public access to the vast web crawl collected by Alexa Internet. Users can search and process billions of documents — even create their own search engines — using Alexa’s search and publication tools.
Search engine optimization and paid search advertising both share a similar goal. While both try to get the right visitors to web sites, even more important is that a visitor performs some action while there that fulfills some goal of a site’s owner.
The meeting of one of these objectives by a visitor is often referred to as a conversion, and can include viewing a specific page, buying something, downloading a file, signing up for a membership or newsletter, or another action as defined by the owner of the site.
In paid search, it can be difficult to tell how effective your campaign has been. Conversion tracking is an approach that can be taken by search engines like Google to help an advertiser understand how well their advertising is working. There’s a nice summary of how conversion tracking works in the O’Reilly article Understanding Google’s Conversion-Tracking Mechanism.
A deeper look into how it works is described on the Google AdWordsTM: Conversion Tracking Guide (pdf)