Alexa collects and displays a fair amount of information about web sites, though I’ve always wondered about the value of that information because it appears to only collect information from people who have an Alexa Toolbar installed on their browser.
A new patent, granted to Alexa this morning but originally filed back in 1999, describes some of the data that could be collected from people with the toolbar, and how Alexa might use it to identify sites that are related to each other in some fashion. (The Alexa Privacy Policy, last updated 2003 (now Updated 16 July 2009), seems to describe an even greater use of the information collected.)
Alexa’s Patent on Meta Data for Related Sites
Analysis of search activities of users to identify related network sites
Invented by Brewster Kahle and Paul van der Merwe Sauer
Assigned to Alexa
US Patent 7,165,069
Granted January 16, 2007
Filed June 28, 1999
There are three different methods described in this patent for collecting information about sites to see how related they might be:
- A link (Web page identifier) analysis methodology;
- Two Web page usage analysis methodologies; and,
- A search results analysis methodology.
In the link analysis methodology, web pages are looked at, and information is collected about pages linked to on those pages based upon how close those links to web pages are to each other.
For the Web page usage analysis methodologies, information is collected about which pages are viewed during the course of a browser session, and the sequence in which they are seen. Sites that may be viewed in one session may be related to each other.
In the search results analysis methodology, information is aggregated from different searchers regarding which pages they view in response to search results generated by a search engine for particular search queries.
Often, when you read about click-throughs in search results, it is in the context of a search engine trying to understand user behavior to refine the results they show searchers by seeing which results users might identify as best, or by incorporating more than one algorithm into search results, and seeing if users pick results shown by one algorithm over another. I’ve also read at least one study in which pairs of results were flip-flopped to study any bias towards the ordering of results over their perceived relevance based upon the contents of their titles and snippets.
Conclusion
This is the first patent or paper I can recall which describes a third party using the selection of search results to try to collect meaningful data about pages that appear in those results, and their relationships to earch other.
If you visit the Alexa page for a specific site, they do often have a list of “related” sites, and I’ve wondered how they come up with that list. This three pronged approach of looking at link proximity on pages, seeing which sites appear together in browsing sessions, and exploring aggregated clickthrough data for specific queries makes some sense. I’m not sure that any of the three approaches by themselves is tremendously helpful, but using them together might add to their value.
It’s interesting to see how a service like Alexa might use information gathered through their toolbar, especially involving interactions with search engines. It has me wondering how people might try to use information compiled with visits to social networking and bookmarking sites like MySpace or Digg or Flickr or Delicious
Hi Aaron,
That sounds like one of the assumptions I was making, too. I hadn’t really tried to pursue what Alexa was doing to gather related links before. If they are taking all of these steps, then it’s more interesting than I thought.
Sounds like a good example of the link analysis methodology, where they decided that links next to each other on the DMOZ page were related. 🙂
Webmasters can request to have their site added as “related” to another site. Alexa claims that they manually reveiw all submissions. I have used it on many sites that compete with me.
Good point, Edward.
I’ve added a site as a related site before.
Ive always thought this also. For the related links it seemed to me that sites that link together became related. But then I checked one of my sites that listed in DMOZ and I seen that other sites in the category were listed as related. So I guess I was wrong. 🙂