Alexa collects and displays a fair amount of information about web sites, though I’ve always wondered about the value of that information because it appears to only collect information from people who have an Alexa Toolbar installed on their browser.
Alexa’s Patent on Meta Data for Related Sites
Analysis of search activities of users to identify related network sites
Invented by Brewster Kahle and Paul van der Merwe Sauer
Assigned to Alexa
US Patent 7,165,069
Granted January 16, 2007
Filed June 28, 1999
There are three different methods described in this patent for collecting information about sites to see how related they might be:
- A link (Web page identifier) analysis methodology;
- Two Web page usage analysis methodologies; and,
- A search results analysis methodology.
In the link analysis methodology, web pages are looked at, and information is collected about pages linked to on those pages based upon how close those links to web pages are to each other.
For the Web page usage analysis methodologies, information is collected about which pages are viewed during the course of a browser session, and the sequence in which they are seen. Sites that may be viewed in one session may be related to each other.
In the search results analysis methodology, information is aggregated from different searchers regarding which pages they view in response to search results generated by a search engine for particular search queries.
Often, when you read about click-throughs in search results, it is in the context of a search engine trying to understand user behavior to refine the results they show searchers by seeing which results users might identify as best, or by incorporating more than one algorithm into search results, and seeing if users pick results shown by one algorithm over another. I’ve also read at least one study in which pairs of results were flip-flopped to study any bias towards the ordering of results over their perceived relevance based upon the contents of their titles and snippets.
This is the first patent or paper I can recall which describes a third party using the selection of search results to try to collect meaningful data about pages that appear in those results, and their relationships to earch other.
If you visit the Alexa page for a specific site, they do often have a list of “related” sites, and I’ve wondered how they come up with that list. This three pronged approach of looking at link proximity on pages, seeing which sites appear together in browsing sessions, and exploring aggregated clickthrough data for specific queries makes some sense. I’m not sure that any of the three approaches by themselves is tremendously helpful, but using them together might add to their value.
It’s interesting to see how a service like Alexa might use information gathered through their toolbar, especially involving interactions with search engines. It has me wondering how people might try to use information compiled with visits to social networking and bookmarking sites like MySpace or Digg or Flickr or Delicious