How Google May Classify Pages Using Hierarchical Categories in URLs

Sharing is caring!

Google was granted an updated version of a patent this week that looks at how the search engine might use directories in URL structures to help it better understand hierarchical categories on a Web site and to categorize new pages and directories that might be added to a site. The patent tells us that this might enable the search engine to add supplemental information to pages, such as advertisements that fall within the categories displayed upon the site.

Some other patents I’ve written about in the past shows that the search engines might be doing more with hierarchical categories than just deciding upon which ads to show on a page

Imagine that you have a site about car parts, and you decided to organize the pages of the site first by car make, so the main categories on your site are different brands, and your second level of directories is organized by car models. You might then have sub-sub-categories that are organized by different systems within cars, such as “electrical,” “transmission,” “cooling,” “suspension,” and so on. URLs for a couple of your pages might look like:

“http://www .mycar.com/ford/fiesta/cooling/”
“http://www .mycar.com/ford/mustang/cooling/”

Chances are good that everything displayed within this directory is about radiators and hoses and so on, for Ford Fiestas in the first URL, and Ford Mustangs in the second URL. Imagine that Google uses this site to create hierarchical categories for cars. Ford comes out with a new model of car, and this website and many other car websites add the new model, and they create new directories for their sites. Let’s call the new model, “Flash”. The hypothetical site might create URLs like:

“http://www .mycar.com/ford/flash/cooling/”

Because of Google’s efforts to build a category model previously, adding this new category for this new make of car doesn’t require rebuilding hierarchical categories from scratch but instead adding a new category into the previously constructed model already in existence.

The patent describes how such a model might be created, and how new items might be placed within it. The patent is:

Training set construction for taxonomic classification
Invented by Philo Juang, Christopher Testa, and Nicolaus Mote
Assigned to Google
US Patent 8,484,194
Granted July 9, 2013
Filed: January 13, 2012

Abstract

A training set generator may be configured to input a taxonomy including a hierarchy of categories and a plurality of top-level sites, and to output a training set of categorized data. The training set generator may include a crawler configured to crawl each of the top-level sites to determine at least one lower-level site associated therewith and to store the top-level sites and associated lower-level sites as crawl data.

The training set generator also may include an extractor configured to determine, for each of the top-level sites, a corresponding site-specific extraction template associating at least one portion of the corresponding top-level site with at least one category of the hierarchy of categories, and further configured to apply each site-specific extraction template to corresponding crawl data to thereby associate the crawl data with the categories of the hierarchical categories and obtain categorized data of the training set.

What makes it interesting to me is some of the other patents I’ve seen from Google over the past few years that describe how categories might be used to do things like create sitelinks for pages in search results as well as re-ranking search results based upon the categories of pages and matching categories for queries. I’ve written about those here:

As I noted in the last of those:

A page (or query) about “Flamingos” might fall within a categorized list, such as:

Household > Lawn Care > Decorative > Flamingos

When a page is indexed, it might be given a text-based score for ranking, as well as a category score. A page about Flamingos would be given a category score based upon how well it correlates with flamingos compared to other pages about flamingos.

A page about lawn decorations, which includes information about flamingos and lawn gnomes might fit into both the flamingo category and the gnome category, but the page’s correlation score for flamingos might not be as high as a correlation score for page only about flamingos.

What this new patent adds is a description of a category model based upon directory structures in URLs, and how new categories might be added to that hierarchical structure.

Hierarchical Categories Take Aways

I’ve been asked if there is much value to using directories for sites in the past within URLs, even for smaller sites.

Is it just as good to use a URL such as:

“http://www .mycar.com-ford-fiesta-cooling/”

As it is to use:

“http://www .mycar.com/ford/fiesta/cooling/”

The keywords that you use in both URLs shows what the page is about. Does using a hierarchical category structure help when it comes to SEO?

Using directories in a manner like this means that it can be easier to maintain a site by organizing topics into directories, so there’s a practical reason for their use.

As seen in the patent I’m referring to in this post, the use of directories gives search engines more clues as to how content on the site is organized and what topics it covers, and can help it quickly identify new categories and where they fit within a hierarchy.

Just as important, a directory structure used can give visitors to a site a better idea of how information is organized on a site, so it can help contribute to better user experience.

Another consideration is in the architecture of the site and using directories to make decisions about how to best link to pages internally on the site so that you can have higher-level pages receive more PageRank, and lower-level pages receive less PageRank.

That can be important if you want to use the higher-level pages to receive more PageRank for more competitive terms, and to use the lower-level pages (with less PageRank) to rank for less competitive terms.

But it’s not just about PageRank, and with search engines looking closer at knowledge bases and entities to determine how things might be ranked and organized in search results, it’s also about making it easier for a search engine to understand how different objects or people or places might be organized and connected to each other.

Sharing is caring!

17 thoughts on “How Google May Classify Pages Using Hierarchical Categories in URLs”

  1. Completely agree with your “better user experience” point, Bill. A better category structure is the best way to serve a smooth navigation for visitors & search engines. That’s why Google provides more category pages results for broaden term rather than specific pages. Many SEOs think it is due to the proper use of keywords as category names but fails to understand that Google is more interested in how the content and quality of pages belonged to that category satisfy the category name.

    Moreover when a query is not specific in its meaning, providing a broader search results in the form of category pages than providing particular pages by guessing the meaning is a best bet for Google.

  2. Just a little typo in the article -> “http://www .mycar.com-ford-fiesta-cooling/”

    I think the category depth completely depends on the site. A corporate site with ford-fiesta-cooling-Xproduct would be a little ridiculous, but a smaller site may be able to get away with being only a couple categories deep.

  3. I try to build/restructure a website’s architecture to mimic breadcrumbs as close as possible.
    When a user enters a website via a lower level page, I would like them to be able to remove the page Url (up to the closest directory / ), and find a page with content for that “parent” category.
    This should continue down to the main homepage.
    @burchems

  4. Good read, Bill.

    I have been privileged to work with some of the largest websites in my country. Sites with tens of millions of pages indexed by Google. Clients ranging from huge multi-national organizations to individuals. I’ve found that information architecture, website structure and URL structure has a big impact on SEO and rankings. I think it is something that has been massively underestimated.

    Here’s what I’ve learned:
    http://searchengineland.com/the-pillars-of-strategic-seo-a-primer-on-website-design-127324
    http://searchengineland.com/how-website-structure-information-architecture-should-mirror-your-business-goals-128138

    Trond

  5. I can’t help but think; didn’t they do that already? it sounds so logical, that I find it hard to believe it’s new …

    anyway, I’ve been the production manager of a link building production line for almost a decade now, but as more and more seo companies seem to disappear, more and more site-owners come directly to me, and the first thing I do; the web site itself;

    filtering out all the crap and adding lots of pages … within a healthy link structure obviously, and it’s ridiculous how well that works, google has been cutting down on off-site so heavily, that doing on-site has become very powerful, and this new thing you talk about will actually only refine our manipulation, doesn’t it?

  6. That’s good insight and makes a lot of sense, but it can get very complex, and I can only hope Google understands the challenge.

    What happens with items that are placed in several categories? Lets say the same cooling device for several car models?
    If I have a video driver that is relevant for 50 different laptops, do I need have the same page in each folder? Then I need canonicals, plus one page in a different navigation just driver based and not category (laptop) based, because I don’t want one of the specific laptop pages the canonical. And I am sure it can get even more complex with other products.

    Would also be interesting to see if the pages BELOW this level have influence on the context of the page. Likely, right?

  7. Thanks for the article Bill.

    Website/URL structure is a classic SEO technique but a technique that I think will never become redundant. I have always explained to clients that URL structure should be designed like a supermarket. Having your information or products organised and categorized so it’s easy for your customers to find. In other words, you’d expect to find milk in the dairy isle.

  8. Thanks for explaining this.
    It does sound very logical which is why I think Google is just now doing this, since when has Google been logical.

  9. This article caught my eye because a few months ago I spent a lot of time reorganizing and redirecting pages of my 6-year-old site to reflect hierarchies. I think it was worth the time spent because it seems that this trend with Google is here to stay – and I think this URL organization is helpful to users as well.

  10. Thanks for this article. I fully agree with your but ont some cases, you have to fix a limit about structured URL.

    I have the same think of you concerning URL structure. When you have content, try to classify it in order to have something coherent. If it requires having 2 or 3 subdirectories, then do it if it makes sens (i take your example about “http://www.mycar.com/ford/mustang/cooling/”.

    However, for single product/article page, I have the impression keeping the easiest structure may be the best solution, especially if someone decides to change the structure of the site, move some content from one place to another place. It avoids creating a lot of 301 redirects.

    However, I would like having your opinion between these 2 solutions:

    “http://www.mycar.com/ford/mustang/cooling/the-super-cooler-gz-430hv”

    or

    “http://www.mycar.com/the-super-cooler-gz-430hv”

    The first one is the most logical but with the second one, I am not afraid if my client decides to rearrange the structure of the site by changing name of categories/subcategories (we all know that clients like doing that :)).

    Thanks !

  11. Hi Ramenos,

    Yes, for a lot of sites that have products, it’s often wise to have simple URLs for those, so that you don’t end up doing things like putting those products in multiple categories and running the risk of having more than one URL for those products. I do like having structure of categories and sub-categores (and sometimes even sub-sub-categories as well).

  12. Pingback: Optimización del Rastreo | Inbound Espanol

Comments are closed.