Search Engine Categorization
If you have a website that classifies products or services or pages into different areas, and your products might be offered in a shopping search engine or other services that draw information from multiple websites, how you classify what you offer may play a role in how that shopping search engine classifies or creates new classifications when it displays your products or services or pages.
A Yahoo patent application describes an automated process, where items entered into different sets of categories can be categorized in other broader categorization schemes.
These broader category schemes could be for product search, for advertisements, for user-tagged items such as photos, for services such as job listings, as well as other areas where there are many websites that have their own unique categorization systems.
The Value of Categories
Search Engine Categorization helps us find specific pieces of information more quickly by letting us organize that information in meaningful ways, and letting us identify the categories that the information belongs within so that we can search through that information assigned to categories.
There are different styles of categorizing information, which can vary from person to person. Some may use a single-tier categorization scheme. Others might prefer a more complex, hierarchical categorization scheme using “parent” categories which may have one or more “child” categories.
Also, some categorization schemes may be coarse-grained with relatively few categories with relatively many members, or fine-grained with many more categories with relatively fewer members.
We see a lot of different categorization methods on many different websites for pages, products, and services. There are times when it can be helpful to organize information that may be found in those different categorizations into a single unified categorization scheme.
For example, shoppers using a search engine to find a specific product might attempt to process data about products from multiple merchants, each with their own categorization scheme. Attempting to serve that information in search results to millions of customers means finding a way to finding some way of handling those different categories from different merchants.
Examples of Classification Contexts
The patent filing primarily focuses upon an example in the context of a product classification system, but the techniques described are applicable to any context in which things that have already been categorized in one domain are assigned to categories in another domain.
Source categories are the original categories of products or services or things on the original pages, and target categories are the classification that have been created to fit into the larger system that aggregates the source categories. Here are some other contexts where this method can be used:
Job sites have their own classification system. To build a larger job site, which includes listings from multiple job sites, there needs to be a way of creating a larger catagorization scheme, which may use the source catagories from the original sites as a factor in determining the target categories that each job listing will go into in the site aggregating the others.
Likewise, sites that include a lot of documents classified into different categories, which could be listed in a site that includes documents from many different sites, may use the original categories in the creation of new categories for the larger site.
A web portal may want to make sure that ads displayed by web pages relate to the subject matter of the web pages, to maximize the effectiveness of those advertisements.
That could be done by assigning ads to target categories in a target scheme and assigning web pages (which may contain search results) to the same target categories in the same target scheme. When a web page is requested, the target category of the web page can be determined, and advertisements from that target category may be selected.
So, advertisements categorized by advertisers could use those source categories to assign the advertisements to target categories. Web pages (or search queries) that have been categorized based on one or more source schemes, could use the source categories associated with the web pages or queries to assign target categories to them.
User Tagging of Events, Photos, etc.
A number of sites now allow computer users to categorize items (events, photos, etc.) by assigning tags to them. These tagging systems don’t usually require any particular tagging scheme, but allows those users the flexibility to develop their own classifications (or source categories).
There may be some benefits imposing a unified tagging scheme (target categories) on these items that have been tagged by a diverse and non-uniform set of taggers.
The Yahoo Category Patent Application
Assigning into one set of categories information that has been assigned to other sets of categories
Inventors: Byron Edward Dom, Hui Han, Ramnath Balasubramanyan, Dmitry Yurievich Pavlov
US Patent Application 20070214140
Published September 13, 2007
Filed: March 10, 2006
Techniques are described for assigning, to target categories of a target scheme, items that have been obtained from a plurality of sources. In situations in which one or more of the sources has organized its information according to a source scheme that differs from the target scheme, the assignment may be based, in part, on an estimate of the probability that items from a particular source category should be assigned to a particular target category.
Such probability estimates may be based on how many training set items associated with the particular source category have been assigned to the particular target category. Source categories may be grouped into clusters.
The probability estimates may also be based on how many training set items within the cluster to which the particular source category has been mapped, have been assigned the particular target category.
Assigning Items from Multiple Sources (and Source Categories) into Target Categories
When sources have organized their information in source schemes different from the target scheme, assignment of categories may be based, in part, on “category-to-category probabilities”. This means an estimate of the probability that items from a particular source category should be assigned to a particular target category.
An item from source category X may be assigned to target category Y if 90% of all previous items from source category X have been assigned to target category Y.
Or, the same item may be assigned to a different target category (e.g. target category Z) if only 10% of all previous items from source category X have been assigned to target category Y.
Source Categories with Same Names and Different Meanings
Sometimes different source sites apply different meanings to the same category name.
An “ornament” from a car dealership is likely to be very different from an “ornament” from a Christmas store.
So, the source of an item may be treated as a component of the source category. Thus, “category X from source A” is treated as one category, and “category X from source Y” is treated as another category.
This way, the category-to-category probability that an “ornament” item from a car dealership should be assigned to a particular target category will not be affected “ornaments” from a Christmas store being assigned to that particular target category.
The source categories may be mapped to source category clusters and cluster-to-category probabilities are used to assign items to target categories. A cluster-to-category probability represents the likelihood that an item that maps to a particular source category cluster should be assigned to a particular target category.
Assume that the source categories X, Y and Z all map to source category cluster C. The cluster-to-category probability that an item from any of source categories X, Y and Z should be assigned to a particular target category B may be based on how many previously assigned items from source categories X, Y and Z were assigned to target category B.
If many of the items from categories Y and Z were assigned to category B, then the cluster-to-category probability that an item form source category X should be assigned to category B may be high, even if few or no items form category X have been assigned to category B.
Using Feeds for Categorization
If we look at feeds used for product categorization, we see how source categories may be supplied to a product search engine.
Here’s an example of a feed that might be supplied to a search engine:
<merchantCategory:Baby and toddler nursery Nursery Themes>
<description:Ring around the stroller! Its an entertaining travel toy . . . its a coordination-building stacking ring. Twist this flexible friend around a stroller or car seat bar and baby will eagerly explore its rattles ribbons and crinkly sounds. Then take it off the stroller bar and use it as a stacking ring that challenges babys fine motor skills. For ages 6 months and up. Imported.>
<mid_pid: 1012440 10413>
This kind of feed method for product categorization system may be designed such that every offer must have values for: title, merchant id (mid), sku and price, as well as optional fields which could include things like description and merchant category (mc).
The merchant catagory is from the merchant’s categorization scheme, and is not the categorization scheme used by the search engine. It is the “source category and not the category used in the target scheme.
The patent application goes into a lot of detail regarding how categories from different sources could be combined into a much smaller and more focused number of target sources, within the context of a shopping search engine.
I think what’s important here isn’t so much knowing how this kind of classification happens as much as knowing that it can happen in at lot of different contexts.
There are a lot of different services provided by search engines which use categories, from business listings in local search to product search, from jobs of many types to advertisements, from date matching services to Question Answering (see the list of categories on Yahoo Answers).
The way that items are classified on source sites may influence which categories that items from those sites are placed within on a site from a search engine which aggregates those items. They may also influence the creation of those target search engine categories.