How Using Categories for Queries Can Help Searchers, Writers, and Search Engines

You search for “Foo Fighters,” and the search engine takes your query and starts searching its databases to identify results. It might look through a video database, to see if there are any good videos to show you. It may dig through a News database to see if there was any recent news tied to the phrase, or an Image database to see if there were any popular pictures of the band. The search engine may see if any advertisers were running campaigns using the band’s name.

Some of that searching is done by trying to take the exact phrase that you used in your search, “Foo Fighters,” to find a set of results that you might be satisfied in seeing. But, there are steps that a search engine could try to take that might give you even better results.

Associating Search Terms with Categories

The search engine might attempt to associate your search term with categories, that can provide you with richer additional results, shortcuts to helpful alternative searches, and other information that might either broaden or focus the search results that you see. Using categories might help broaden results if there aren’t very many results using the terms you searched with, and it may focus results if there are too many results for the term which you used in your search.

Using categories in such a way might enable the search engine to offer related concepts to a search for “Foo Fighters,” such as “Dave Grohl,” or “tickets,” or “tour dates,” or “Nirvana.”

Putting your search phrase within a hierarchical set of categories might also help the search engine show more relevant or more diverse advertisements.

Why Use Categories?

When someone searches, the intent behind their search may be really difficult to determine – especially since most searches are fairly short – often only a couple of words – and the context of a search can be difficult to determine.

When a search engine associates a category label to a query, the context and the user intent might be a little easier to understand.

In my example with the Foo Fighers, a search engine could just try to find results that use the phrase on their pages,and return those, rank them, and display results to a searcher. Or, it could place the query into a category, and look in associated hierarchical categories to find relationships with other related terms.

A high level of hierarchy could be fairly general, such as entertainment, travel, sports, etc., and would be followed by lower levels of hierarchy with categories that get more specific, such as a second level hierarchy containing the category “music,” a third level hierarchy containing the category “genre,” a fourth level hierarchy containing the category “band,” a fifth level hierarchy containing the category “albums,” a sixth level hierarchy containing the category of “songs,” etc.

Placing the query “Foo Fighters” into such a hierarchical set of categories, and seeing what else is located within related categories enables both organic search and paid search to consider the names of band members, albums and songs from the band, bands that perform in the same genre, and much more. That kind of categorization can provide more context and more information that could help in determining the intent of someone searching for Foo Fighters.

A patent application from Yahoo explores how a searcher’s query terms can be classified into categories, and how related query terms in associated higher and lower level categories can be found and used to return more meaningful search results, suggestions, recommendations, and advertising to searchers.

System for classifying a search query
Invented by Xiaofei He, and Pradhuman Dasharalhasinh Jhala
Assigned to Yahoo
US Patent Application 20080183685
Published July 31, 2008
Filed: January 26, 2007

How Does This Categorization Of Queries Work?

A rough overview might be:

Your query phrase is submitted to the search engine, and the query is reviewed to see if it has been classified before.

If it has, it is assigned the category label, and if it hasn’t, then a category is calculated for it

To calculate that category, Web pages are returned in response to the query, and a predefined number of the top returned web pages are identified to “represent” the query.

A model about related terms and information that is meaningful to the original query is created by looking at how the original term is used on the web pages, and how other terms found on those pages are used.

Some of the other terms found on the web pages may be filtered out of the process for one reason or another. For instance:

a) Unwanted terms and/or symbols, and numbers, may be removed to decrease the “amount of noise.” These could be things like “articles, prepositions, conjunctions, etc., e.g. ‘the’, ‘a’, ‘with’, ‘of’, etc.”

b) Terms that are based upon the same root or stem might be removed because they could be considered duplicates of each other (sing, sang, sung, singing, etc.).

c) A term might be reduced to the simplest version that it can be (a “standard” canonical form) presented as, by removing prefixes, suffixes, plural designations, and so on, so that there is only one version of the term used from the web pages to relate to the original query term.

The other terms found on the web pages may have categories assigned to them already. Based upon how frequently those other terms are used on those web pages, and other criteria about the terms, the original query term may be placed into a category, or assigned a new category.

How We Use Categories

We perform a similar kind of categorization on our own.

For example, if you think about the old blues musician Muddy Waters, you might create a number of categories for him:

He was a musician.
He played a genre of music known as the blues.
The kind of blues he played is a sub genre known as Chicago Blues.
He was a guitarist.
He was a blues innovator, playing the electric guitar.
He was a singer.
He was born in Mississippi.
He was a resident of Chicago.
He was a resident of Illinois.
He was a strong influencer of rock bands like the Rolling Stones.

If I wanted to perform searches for Muddy Waters, I could expand my queries by looking at related terms within those categories, and many more that I could create. In essence, the search engine is trying to do the same thing, by identifying other terms that appear on pages that rank well for the original query, and seeing what categories those other terms fit within, to assign a category for the original query. Then it explores those other categories, including the higher(or broader) and lower (or more specific) level ones, to find related terms and to expand search results, offer suggestions, and provide advertising.

Conclusion

The patent application goes into a great amount of detail on the process involved in assigning category labels to query terms, including many examples, and explores how user data can be incorporated in the process to check up on the semantic relationships between terms identified in this classification process.

An example – if people searching for “Muddy Waters” click on links to pages about the Chicago Blues, people searching for another bluesman, Howlin Wolf, also click on the same links to pages about the Chicago Blues. That provides some verification that both individuals (and queries using their names) belong to that category, and that both are related.

Thinking about categories that could be created for queries can be helpful to searchers, to people who write content for web sites, and for search engines trying to deliver search results to people.

Regardless of whether you’re a searcher, a writer, or an indexer of web pages, classifying and categorizing topics and queries can be a helpful process to use in finding information, creating new information, or helping others to locate something. It makes sense for a search engine to explore how categories can help it provide results.

Do you do something similar with categories when you search on the Web, or when you create content for Web pages?

Share

11 thoughts on “How Using Categories for Queries Can Help Searchers, Writers, and Search Engines”

  1. Wow that post was very interesting, the way you go into such depth is probably the reason you’re one of the best SEO blogs out there. I’ve been trying to make better use of categories on my blogs… I think a lot of people over use the feature.

  2. Thanks for your kind words, Chris.

    Assigning categories to blog posts can be difficult. I’ve struggled with that myself – I’m trying to only assign one category per post, and often a post will fit pretty easily into more than one.

    I’ve seen blogs that have hundreds of categories, and blogs that only have a handful. How helpful are categories to people who visit your pages, or to search engines? How many might be too many? Or too few? Are there any best practices for blog categories, or is it often a question of taste? It’s a topic that could probably fill more than a few blog posts.

  3. That are really good informations about searching sites in the internet. By using categories you could find informations much easier, and you don’t have to look through many pages to get the right informations. So that’s a great blog about search engines. I have enjoyed reading it, and it will really help me.

  4. Hi bergmann,

    Thank you.

    It is a different way of thinking about how searches and search engines work, with categories assigned to search terms, and looking at terms that might be related in associated categories.

    I’ve been using hierarchical catagories for years when creating pages for sites, and thinking about how the sites should be organized, and what the content on pages should include, including specific words and phrases.

    It’s interesting to see how search engines may be assigning categories to search phrases.

  5. Very weird but I did notice that people are using categories more on an e-commerce site to find what they are looking for than on a blog. Perhaps these different types of sites are attracting a different audiance ?

    What do you think bout the usage of a tag cloud in stead of categories ?

    Dave

  6. Very nice post about searches and how search engines can behave or react. Depending on the system and new technologies, I am expecting very wise search engines and searchers. I can say that it is a smarter way to let your site being found by search engines. I often use many categories to let search engines identify my posts as a broaden search and I am agreed with every word in this post.

  7. Hi Dave,

    A smart use of categories on an ecommerce site makes a lot of sense. Amazon.com is a good example of an ecommerce site that uses a wide range of categories, including hierarchical categories.

    Tag clouds are interesting, but they don’t really take advantage of the kind of hierarchy that I described above. I do like them as one way of letting people know about what they might find upon a site.

    Hi zctglassman83,

    Thank you. The use of categories may help provide more context and insight into intent of a search than just a straightforward keyword matching based search. The catagories that you assign your blog posts might help a search engine match your page up with a query that has been similarly categoried, especially if the content of the query matches well with the category that you placed your post within.

  8. Here is something very interested to share, While searching many terms(which are truly related to products), SERPS always show blogs. I think Blogs are more popular and informative but does it mean that we only need an information or reviews? On the other hand, Don’t you think that the people who own those sites are loosing a lot of business? May be in near future, we shall be selling the stuff via our blogs rather than sites. I think blogs are far easy to optimize but don’t you think that they are destroying our real life sites?

  9. Hi buy wii,

    I think that a search engine also will categorize by site type – news, blogs, ecommerce site, music search results, book search results, and so on, and may attempt to provide some diversity in the search results it displays. That may mean that the top ten results may be a mix of site types, including blogs.

    Research coming out from Google and other search engines shows that most queries tend to be ones where the searcher is looking for information rather than to complete a transaction, so it isn’t surprising that sites focusing upon providing information appear in the top search results.

    For many of us who create content on the web on a regular basis, our “real life” sites are our blogs. And many businesses are adding blogs to their sites, to provide information and inspire conversation and communication with their customers. I think in most cases that is a good thing to see happening.

  10. Sometimes I found myself very helpless and depressed. I think all blogs should fall in blog tab of Google but they only have their own hosted blogs, That is some kind of drawback. We don’t search information all the time at least.

  11. Hi Zach,

    I think that you’re saying that blogs shouldn’t be included with web pages, in Google’s web search. I can understand your perspective, I’m not so sure that I agree.

    I was faced with the opposite problem on a number of searches yesterday. I wanted to find information, and the search results I was seeing for the topic I was searching for was heavily dominated by ecommerce results. I would have been happy to see some blog posts related to what I was looking for.

    Maybe it would help both of us if search results from Google were sometimes a little more diverse.

Comments are closed.