How Google Might Personalize Search Results Outside of Personalized Search
Not long ago, during a search at Google, a message at the top of the search results told me that my results were,
“Customized based on recent search activity.”
A link next to that message provided more information, telling me that if I signed into my Google Account, I might see “even more relevant, useful results,” based upon my “web history.”
During another recent search, a similar message appeared telling me that my results were based upon my location, with the results biased towards Philadelphia, which isn’t too far away.
I’ve been wondering since what it is that Google is considering when it makes changes to my search results like that. The major commercial search engines act as an index to the Web to many people who rely upon them when looking for information online.
Imagine an index that changes for every searcher.
What might that mean to searchers and to the site owners who hope that search engines will help people find their pages? What information might Google be looking at when it customizes search results based upon “recent search history,” or the location of a searchers?
We may have started receiving some clues…
Last week, Google was granted patents describing how they could change the order of pages in search results based upon the preferred language of a searcher, or by what appeared to be their preferred country for results. Both patents were filed roughly about the same time, back in 2003, and both show ways of personalizing search results for searchers based upon some information that could make the results they see more meaningful to them.
So Google might change the ordering of your search results based upon which country and which language you might “prefer” to see search results in, and it might even provide different search results to different searchers based upon an even wider range of differences between those searchers.
Another patent granted to Google this week, also originally filed in 2003, looks at different “populations” that a searcher may be part of so that it can reorder pages within search results based upon user behavior of those populations. User behavior information such as what pages members of those populations click through when presented with search results. The patent is:
Methods and systems for improving a search ranking using population information
Invented by Simon Tong and Mark Pearson
Assigned to Google
US Patent 7,454,417
Granted November 18, 2008
Filed September 12, 2003
Systems and methods that improve search rankings for a search query by using data associated with queries related to the search query are described.
In one aspect, a search query is received, a population associated with the search query is determined, an article (such as a webpage) associated with the search query is determined, and a ranking score for the article based at least in part on data associated with the population is determined.
Algorithms and types of data associated with a population useful in carrying out such systems and methods are described.
What is Population Information?
According to the patent, after a searcher performs a search, and a search engine retrieves a listing of pages in response to that search, ranked in order of relevance, the search engine might then look to see if there is a population signal associated with the searcher, and reorder the search results that searcher is shown.
We are told that population information about searchers might broadly include such things as:
- The locations of users,
- The populations with which users are associated, and;
- Information about groups with which users are associated.
Location information might include:
- a continent,
- a region,
- a country,
- a state,
- a county, or;
- a city.
Populations with which users may be associated might be based upon:
- a gender,
- a demographic,
- an ethnicity,
- a continent,
- a region,
- a country,
- a state,
- a county, or;
- a city.
An example of a population with which searchers could be associated might be age ranges of those searchers, such as “under 18 years old,” “18-24 years old,” “25-34 years old,” “35-49 years old,” “50-62 years old,” and “over 62 years old.”
A slightly different way of diferentiating searchers might involve looking at “groups” with which searchers are associated, based upon things such as:
- a gender,
- a demographic group,
- an ethnic group,
- persons with a shared characteristic,
- persons with a shared interest, and;
- persons grouped by a predetermined selection.
An example provided by the patent of groups with which searchers can be associated with can be identified as “all persons interested in collecting ancient shark teeth,” and “all persons not interested in collecting ancient shark teeth.”
Self Identification and Automatic Identification Data
A search engine can gather some information to use in collecting signals about what populations we might be members of by looking at browser settings and other information independent of what we search for, by looking at what the patent refers to as “self identification-type data” and “automatic identification-type data.”
Self identification-type data can include such information as user registration data when you register for something like personalized search or other applications, user preference data such as a preferred language that you might like to see search results in, other user selected data.
Automatic identification-type data may include information collected in other ways, such as:
- the Internet protocol address of a searcher’s location,
- default data obtained from a searcher’s browser application program,
- cookies, and;
- other data collected from a searcher’s application program when the searcher’s application program interacts with a search engine.
When someone in Japan types in a first search query such as “boating,” the search engine knows that the search comes from Internet protocol address located in Japan, and may see that the searcher has their browser set to a Japanese language preference. When that searcher starts selecting search results, they may start choosing pages that are in the Japanese language.
The search results that they are shown may be reordered to show them search results based upon that automatically collected information as well as the selections they make when searching. The searcher in Japan looking for [boating], may be shown search results such as “boating.co.jp,” which may be a good match for the population data that Google has collected about the searcher.
User Behavior and Population Information
If a searcher is indentified as belonging to a specific population, that searcher’s search results may be reordered based upon information about how other members of those populations interacted with web pages while searching or browsing the Web.
Our searcher in Japan, looking for pages about boating in Japanese may have his or her search results reordered based upon which pages other searchers from Japan looking for Japanese language results selected within search results.
The reordering of those search results could be based upon other activities of those searchers who share similar population signals.
Populations and Sub-Populations
Sometimes there isn’t very much user behavior activity collected for people in certain populations. That’s where the idea of sub-populations can be used to broaden the reordering of search results based upon population information.
People who live in Paris are a sub-population of people who live in France, and people who live in France are a sub-population of people who live in Europe, who are a sub-population of people who live in the World.
If there isn’t much user behavior information collected for a specific query term for people who are members of the population of Paris, the search engine might then look at the amount of user behavior information for people who are members of the population of France, and then of Europe. If there isn’t much, or no user behavior information at all for a sub-population, information from a higher population might be used to reorder search results.
Any user information about members of the population of Paris might be given a greater weight then user behavior information from the larger population of France, and even more than the weight given to the even larger population of Europe.
Clickthroughs and Other Data
A clickthrough is the selection by a searcher of a page listed in search results in response to a particular query. These searcher clicks aren’t the only kind of data that a search engine may collect that might indicate that people are interested in certain pages when searching for specific queries.
Some other information that a search engine might consider when associating pages with members of different populations might include:
- How often a particular URL, document, or web page is shown in response to a search query;
- How many times a particular search query is asked by users from a particular location;
- How many times a particular search query is asked by users from a particular population;
- How many times a particular document is selected by users from a particular location,
- How many times a particular document is selected by users from a particular population;
- How many times a particular document is by selected by users for a particular search query;
- The age or time a particular document has been posted on the Web, and;
- The identity of a source of a particular page on the Web.
The idea of grouping searchers into different populations, and showing those searchers lists of results for a query in an order based upon past user behavior of other members of that population means that the search results you see may be very different than the search results that I see, especially if we are from very different populations – different countries, preferring different languages, showing different interests in what we tend to search for, and so on.
The patent refers to a “smoothing factor” which “reflects how much data is needed to trust a click signal.” This means that if there isn’t very much user behavior information collected for a specific population regarding a specific query, that information might not be used to reorder the search results that you see.
What this might mean for searchers is that Google might show you results in an order based upon what it thinks might interest you the most, based upon whatever information it can collect about you, and based upon what people whom it believes share population information with you found interesting enough to look at in the past.
What this might mean for site owners is that the search engine might rank your pages higher for some populations of searchers and lower for other populations of searchers based upon a wider range of information than just whether your site is relevant for specific query terms.