A recent comment here noted that the core algorithm behind how Google works hasn’t changed very much since its earliest days. I’m not sure that I agree. Many of the posts I’ve made over the past five years that involve Google patents and whitepapers describe ways that Google may be changing how it determines which results to show searchers.
Many of the changes Google makes to its algorithms aren’t always visible to its users, while others that change the interface we see when we search tend to stand out more. Interestingly, many changes that Google makes are based upon live tests that we may catch a glimpse of if are lucky, and we pay attention.
Google’s Testing Infrastructure
At Google, experimentation is practically a mantra; we evaluate almost every change that potentially affects what our users experience. Such changes include not only obvious user-visible changes such as modifications to a user interface, but also more subtle changes such as different machine learning algorithms that might affect ranking or content selection…
Continue reading “We’re All Google’s Lab Rats”
When you arrive at a web page, the owner of that page might start collecting information about your visit for a number of reasons. One of the most commonly collected pieces of information is an internet protocol (or IP) address. An IP address is a number that can be associated with the way and the place that you access the Web.
The Difficulties of Using an IP Address as a Data Point
Your IP address might be assigned to a server or a router that you use to connect to the Web, or a proxy server or firewall that stands between the computer that you are using and the rest of the internet. You might go online on a computer that you share with other people at home or at a public place like a library, or at an office filled with other computers. You might share an IP address with roommates or family on the same computer, or use more than one computer through the same IP address.
A unique IP address might be assigned to your internet access every time you dial into the internet, or may be leased by your router on a weekly basis through your broadband provider and may change if that lease isn’t automatically renewed by logging in within a certain amount of time after the lease period is over. If you access the web through an office, your IP address that can be seen by the pages you visit might be that of your company’s firewall.
Continue reading “How and Why Google Might Estimate the Number of Users Behind an IP Address”
Two Microsoft papers being presented at this week’s SIGIR’10 conference in Geneva, Switzerland explore the topics of Search Trails – The pages that a searcher travels through after performing a search for a query before reaching a final destination page.
The idea of delivering searchers to a final destination page, a page where previous searchers for a specific query often end up at before they either stop searching, or changed the focus of their search, is something that Microsoft has explored in the past.
I wrote about a patent filing from Microsoft a couple of years ago which explored how user behavior signals, such as how searchers browsed through pages to find information might be used to rerank search results. The post, Search Trails: Destinations, Interactive Hubs, and Way Stations, took a look at how search trails – the pages browsed between an initial query and a final page visited, might offer useful query suggestions to searchers as well.
That patent filing, and the 2007 SIGIR best paper, Studying the Use of Popular Destinations to Enhance Web Search Interaction (pdf) by Ryen W. White, Mikhail Bilenko, and Silviu Cucerzan, focused more upon the final destination pages found than the pages visited along the way. Ryen White is listed as a co-author in the earlier papers and patent filing on search trails, and he is one of the authors listed on the papers presented this week in Switzerland as well.
Continue reading “The Importance of the Journey: Search Trails and Destination Pages”
A search engine might use two sets of indexes – one for query terms that tend to show up in more searches and on more web pages, and another larger index that includes queries that aren’t searched for as much by searchers and don’t appear on many web pages.
By showing results for some terms only from a smaller index, information about pages which include those terms can be retrieved quicker by a search engine.
How would a search engine know which queries to search for in the smaller main index, and which to search for within that larger index?
I’ve written in the past about a patent on an extended index from Google, as well as a patent on a supplemental index from Microsoft, and both patents focused mostly upon how those indexes might be set up.
Continue reading “Head URLs and Tail URLs and Bing’s Supplemental Index?”
What are Named Entities?
Named Entities are specific people, places or things, and a focus of what Google might look for when returning information about queries. They got a lot smarter in answering questions about named entities with the acquisition of MetaWeb, which had developed a way of better understanding named entitied in searches for them, which Google appears to have adopted.
Here is an example of how MetaWeb handled named entities, as described in one of the patents they had been granted:
You may know him by a number of names or titles – Governor of California, Terminator, Governator, Conan the Barbarian, Kindergarten Cop, Mr. Universe, Mr. Olympia, Arnold Strong, Arnie, The Austrian Oak.
To Metaweb, Arnold Schwarzenegger is referred to as 9202a8c04000641f8000000000006567.
Continue reading “Google Gets Smarter with Named Entities: Acquires MetaWeb”