In the recent predictions for 2007 post, a commenter asked about the impact of the Open Directory Project and its data on research in 2006, and what the information might be used for in 2007. I was also asked what I thought was the most interesting use.
Those were some great questions, and I really hadn’t tracked carefully the appearance of the Open Directory Project in published patent applications and granted patents, though I had seen it mentioned in a number of papers in 2006. I thought it might be interesting to find mentions of the DMOZ in the patent filings that were either granted, or published publicly as applications during the year, and see how the Directory was referred to, or the data from the directory was used.
Below are links to those documents, who they are assigned to, and a brief synopsis of why the DMOZ was mentioned within them. In the conclusion, I list some of the more interesting of those, and the one that I liked the most.
Patents granted in 2006
Virtual directory (7,149,743)
– A virtual directory that overlays a number of other directories, including DMOZ, and would allow users to access the information in those other directories. Note: a visit to the URL results in a “site under construction message.”
Meta-document management system with transit triggered enrichment (7,117,432)
System with user directed enrichment and import/export control (7,133,862)
– These two patents appear to be related, and use the same text to describe how they would use DMOZ information – context information, from a source such as DMOZ, could be used to enrich query results.
System and method for distributed real-time search (7,099,871)
– DMOZ categories as query refinement options
System and method for multiple data sources to plug into a standardized interface for distributed deep search (7,013,303)
– Would use categories from a source like DMOZ to offer to searchers looking for information.
System and method for evaluating and enhancing source anonymity for encrypted web traffic (7,096,200)
– DMOZ was used as a source of information for an experiment to test the effectiveness of the process defined in this patent:
To demonstrate the effectiveness of source identification based on a statistical comparison of traffic signatures of encrypted Web traffic, the procedure of an actual study and its results are described below. In that particular study, traffic signature information is collected on a sample of just under 100,000 Web pages, from a wide range of different sites. The pages were obtained from the DMOZ Open Directory Project link database (http://dmoz.org), half of them chosen from various categories of “sensitive” site to which an adversary might be interested in spotting visitors, and the other half chosen randomly.
Identifying navigation bars and objectionable navigation bars (7,089,490)
– An Open Directory Project page was used as one of three examples to show how the process involved in rewriting and reconfiguring pages for smaller screens would work.
Interface and system for providing persistent contextual relevance for commerce activities in a networked environment (7,089,237)
– Used as an example, structure-wise, of a user-specific meta catalog that could be generated by the system described in this patent.
E-Nvent USA Inc.
Method and system for mapping and searching the Internet and displaying the results in a visual form (7,085,753)
– Mentioned as the directory that Google provides to users, rather than a directory generated from the search data that the search engine collects as it crawls the web.
Thebrain Technologies Corp.
Method and apparatus for sharing many thought databases among many clients (7,076,736)
– An example of a source of information that could be reorganized in a meaningful manner by using the process described within the patent:
These matrix-sharing methods could be ideal for an Internet search engine which is updated and kept current automatically by massively distributed “spider” computer programs each updating a common shared source of Brain data items of its findings. As of this writing, a service permitting Internet users to search for, and navigate amongst websites using the thought-link structure of the present invention is available to the public at the URL www.webbrain.com. That type of service can be created, in large part, by applying the automatic “spidering” techniques for matrix creation (described above) to a publicly available Internet directories such as the “Open Directory Project” (www.dmoz.com).
CNET Networks, Inc.
Apparatus and method for delivering information over a network (7,016,892)
– Using content from a source like the Open Directory Project to serve along with advertisements in response to a query.
Patent Applications Published in 2006
Categorizing objects, such as documents and/or clusters, with respect to a taxonomy and data structures derived from such categorization (20060242147)
– DMOZ as an example of a taxonomy that could be used in a probabilistic hierarchical inferential learner
Systems and methods for providing a graphical display of search activity (20060224938)
Systems and methods for managing multiple user accounts (20060224624)
Systems and methods for providing subscription-based personalization (20060224615)
Systems and methods for combining sets of favorites (20060224608)
Systems and methods for modifying search results based on a user’s history (20060224587)
Systems and methods for analyzing a user’s web history (20060224583)
In a user profile for a personalized search system, described in a number of related patent applications, levels of interest might be assigned to topic categories:
For example, by examining one or more of the various events a user profile may be created indicating levels of interest in various topic categories (e.g., a weighted set of Open Directory Project (http://dmoz.org) topics).
Associating supplementary information with network-based content locations (20060224662)
– DMOZ used as an analogy, of associating a page with a category.
Dispersing search engine results by using page category information (20060004717)
– Used as a source to help categorize web pages:
 A category tool 218 is linked to index 116, and identifies one or more categories for each of the retrieved web pages as a function of the parsed content (i.e., identified document data) and one or more external data sources such as the Open Directory Project (ODP) or index data regarding previously categorized web pages.
As known to those skilled in the art, the ODP is one of the most widely distributed databases of content classified by humans. For example, assume the ODP has categorized a web page having the URL www.gs.com under Business.fwdarw.Finance, and has categorized a web having the URL www.gs.com/venturecapital/under Business.fwdarw.Finance.fwdarw.Entrepreneurship.
If the category tool 218 is given the URL www.gs.com/venturecapital/foo/bar.html to categorize, it will initially query the external data sources and/or index for a matching URL. If the URL www.gs.com/venturecapital/foo/bar.html is not found, the category tool 218 will then query the external data sources and/or index for www.gs.com/venturecapital/foo. Finally, if www.gs.com/venturecapital/foo is not found, the category tool 218 will check for www.gs.com/venturecapital and will assign the web page (i.e., www.gs.com/venturecapital/foo/bar.html) with the category Business.fwdarw.Finance.fwdarw.Entrepreneurship.
Reputation of an entity associated with a content item (20060253584)
Indicating website reputations based on website handling of personal information (20060253583)
Indicating website reputations within search results (20060253582)
Indicating website reputations during website manipulation of user information (20060253581)
Website reputation product architecture (20060253580)
Indicating website reputations during an electronic commerce transaction (20060253579)
Indicating website reputations during user interactions (20060253578)
Determining website reputations using automatic testing (20060253458)
– When someone attempts to access unsafe content, they may be given a choice of alternatives based on category information from sources such as DMOZ, the Yahoo directory, and other places.
Plain Sight Systems, Inc.
System and method for document analysis, processing and information extraction (20060155751)
– Building a contextualized search by starting crawling a certain DMOZ categories.
Automatic, personalized online information and product services (20060136589)
– DMOZ used as an example of a topic tree, that could be generated from a users interests.
Method and apparatus for explaining categorization decisions (20060136410)
– A categorizing program may assign documents topics such as found in the ontology used by the DMOZ.
Oculus Info Inc.
System and method for interactive multi-dimensional visual representation of information content and properties (20060116994)
– Filtering of sites listed in shopping categories within the Open Directory so that more informational results are shown:
 The results views 209 also allow the analyst to annotate documents. Annotated documents appear with a yellow pin 217. The analyst can also hide categories and documents (e.g. hide all documents from sites categorized as “Shopping” in the open directory project web site classification dimension). When one or more documents are hidden they will disappear from their position in all the result views. Instead they preferably appear in a separate hidden category (not shown). In this way the analyst can remove documents from consideration, though with the opportunity to restore them if necessary. The hidden category can also be set to avoid displaying its aggregate results, instead showing a single line indicating the number of results it contains
Topixa, Inc., DBA Rawsugar
User interface for conducting a search directed by a hierarchy-free set of topics (20060059143)
Conducting a search directed by a hierarchy-free set of topics (20060059135)
Creating attachments and ranking users and attachments for conducting a search directed by a hierarchy-free set of topics (20060059134)
– DMOZ top level categories used as a source of topics for a search methodology.
 As mentioned above, according to one aspect of the invention, a registered user can add a topic to the set of topics, and can add one or more URLs attached to that topic. Therefore, in one embodiment, an initial set of topics is pre-defined. In one version, this set consists of the top level topics from the Open Directory Project (ODP), also known as DMOZ, run by Netscape Communication Corporation, Mountain View, Calif. For further information, see www.dmoz.org, and http://dmoz.org/about.html.
Accenture Global Services GMBH
Active relationship management (20060036461)
– As a source in attempting to determine whether organizations are related to each other.
Methods, apparatus and computer programs for characterizing web resources (20060026496)
– As an example of a resource that classifies websites rather than just web pages.
Methods and systems for interactive search (20060010117)
– DMOZ as an example of a web directory.
Ecosystem method of aggregation and search and related techniques (20060004691)
– To help assign topics to blog posts based upon outbound links, and if they appear in a directory like DMOZ:
 According to a specific embodiment, content may be classified based on links to an established topic directory or ontology, e.g., by looking at each piece of content and identifying outbound links and unusual phrases. An outbound link is then checked against an ontology (e.g., DMOZ (see http://dmoz.org/) or any other suitable ontology) and based on the link pattern, the content is automatically tagged as inside of that particular category. Then, a relevance weight may be assigned to the document with reference to the author’s relative authority inside of that category (see below) as well as inbound links to that document inside of that category. This weight may further incorporate self-categorization, (e.g. “tags”) of blogs and posts.
Computer Graphic Display Visualization System and Method (20060288023)
– Mentions the use of thebrain.com’s search service for DMOZ.
An alternate hierarchal representation of information is provided by TheBrain.com, Santa Monica, Calif. 90404, www.thebrain.com, which has developed a dynamic information presentation applet showing hierarchal links between data elements, which may include hyperlinks to associated resources. More recently, TheBrain.com has developed an Open Directory Project search service for presenting search results within their applet framework.
Large-scale metasearch engine (20060184514)
– Used as a source of data to find pages for an experiment.
 An experiment was carried out to evaluate the Search Engine Discovery Component of the instant invention. The experiment included the following steps.  1. The RDF dump from http://dmoz.org, was downloaded. DMOZ is said to be the largest human-edited directory, containing millions of Webpages. A total of 519 Webpages are collected as a result of random selection, each having at least one form.  2. A manual check revealed that 307 of the 519 pages contain at least one search engine form.  3. The discovery program reported 286 search pages from the same collection of 519 Webpages.  4. 286 URLs appeared in both the manual check and the report from the discovery program. 21 URLs were listed only in the manual check, meaning that the search engine discovery component missed 21 search engines. There was no misclassification. The discovery success rate is 93% (286/307).
Method for search result clustering (20060117002)
– As fill-in categories for clustering:
 Flexible combinations of keyword classes and other class identifiers may be used. For example, document classes from a conventional classification system (such as a web page directory like the Open Directory Project, http://www.dmoz.com) can be used as the KWAC classes of a document associated with some index keyword(s) when there are no appropriate keywords that are related to the index keyword(s) in the document.
Process for matching vendors and users of search engines so that more valuable leads are generated for vendors (20060074890)
– In searching for vendors in response to a search query, DMOZ might be one of the resources used to find those merchants.
 This method can be applied not only to search-engines that index the entire web, but also to engines that cover only a portion of the web, and even engines that are restricted to just one web-domain. It can also be used in directories (such as dmoz) that contain vendor descriptions in addition to other general non-commercial content.
The patent filings mentioning the Open Directory Project range from using the DMOZ as an example of a directory or ontology on the web, to a source for experiments, to some even more interesting uses. Here’s a quick summary of some of those.
1. Microsoft used DMOZ data in an experiment regarding encrption of web traffic.
2. Thebrain Technologies Corp., created a search interface for the Open Directory Project that allows the data within it to be organized differently.
3. McAffee, in a family of patent applications, would use content like that from the DMOZ as a safe alternative to unsafe sites.
4. Plain Sight Systems, Inc., writes of using the DMOZ as a seed starting point for web crawling for a contextualized search.
5. Oculus Info Inc., would consider filtering out of results of informational searches pages that might be found in shopping categories at DMOZ.
6. Technorati would look at shared outbound links from blogposts and the DMOZ to help provide categories to posts.
I’m not sure how effective it might be, but the Technorati use of DMOZ information was the most interesting to me. I imagine that the intent behind an outbound link may not match the DMOZ category for a link to that page, but it’s a creative use, and I’d like to see how effective it might be.