Do Search Engines Look at Keywords in URLs?

Is there any value in using keywords in the URLs of web pages? Would a search engine look at keywords that you might include in the addresses of your pages, and associate those keywords with the content of your pages in the search engine’s index?

If so, how would a search engine go about looking at the web addresses indicated in the URLs to your pages, and break them down into meaningful parts to identify keywords?

Breaking URLs down into parts may also play a role in how the pages of a web site might be crawled by a search engine.

A newly published Yahoo patent application gives us some ideas on how it might extract keywords from the URLs of pages, and rank them, as well as using information uncovered in the process to determine which pages to crawl first from a web site.

Techniques for Tokenizing URLs
Invented by Krishna Leela Poola and Arun Ramanujapuram
Assigned to Yahoo
US Patent Application 20090083266
Published March 26, 2009
Filed November 6, 2007

A search engine will look at many different signals to determine what a page on the Web is about, and attempt to rank pages based upon keywords that might be an indication of the subject matter or content of those pages.

Many of those keywords are extracted from the content of pages themselves, but a search engine can look at other information associated with pages, such as the addresses of the pages.

Keywords may also be extracted from the URLs of pages, by using an algorithm that can break the URL into components, understanding the structure of those URLs, and removing candidate keywords from the different parts found within the URL.

Parts of URLs

The patent application provides a definition for different parts of URLs:

Scheme – This section of a URL identifies the internet protocol used to access a resource, such as HTTP or FTP

Authority – The part of a URL that identifies the host server where the documents or resources are located, or the domain name.

Path – This is the information following the slash character after the authority, or domain name, and it identifies the specific page or resource

Query arguments – A string that may appear in a path that can be broken down into name and value pairs, such as “category=shirts”

Fragments - A fragment identifies a subsection within a page that might be pointed to in a URL, ususally started with the “#” symbol

An example of these five different components from the patent filing:

http://www.yahoo.com:80/shopping/search?kw=blaupunkt#desc

In this URL, the scheme is “http”

The authority is “www.yahoo.com:80″ which shows the domain, and also includes a port number of “80” in this instance.

The path is technically everything after that first single slash: “shopping/search?kw=blaupunkt#desc”

A query argument shown in this example is “kw=blaupunkt”

A fragment from this URL is #desc

Tokenizing URLs for Keywords and Web Crawling

The patent application describes a way that it might break down URLs into parts, or components, to extract keywords from URLs. Those keywords could be used to categorize pages for web search, and to understand what pages are about when providing advertisements for those pages.

This breaking down of URLs into components, and even smaller parts is referred to as “tokenizing URLs.” In addition to helping a search engine find keywords in URLs, it can have an impact on the indexing of the pages of a web site:

The tokens generated by URL tokenization may also be assigned with features of the web document to improve the efficiency of a web search. Tokenizing URLs is also the first step when clustering URLs of a website. Clustering URLs allows the identification of portions of a web document that hold more relevance. Thus, when a website is crawled by a search engine, some portions of web documents may be white-listed and should be crawled, while other portions may be black-listed and should not be crawled. This leads to more efficient web crawling.

Conclusion

Yahoo provides a fair amount of detail in the patent filing on how URLs can be broken down into components, and how keywords can be extracted from those components, as well as provided different rankings. If you’re interested in how the URLs of your site might be treated under this process, it’s worth spending some time with the patent filing itself to get a grasp of the technical details. Keep in mind that the processes from this patent application may not be the ones that Yahoo may presently be using at this time,

A cautionary note – changing the URLs to your pages, especially if those URLs have been around for a while and are indexed by search engines, is an undertaking that shouldn’t be started without careful consideration, and without using a cautious approach that keeps the risk behind such a change to a minimum. Such an approach can include using proper redirects (permanent 301 redirects) to any new URLs for external links pointed to pages of the site, actually changing URLs in internal links to the new addresses upon the site itself, and other technical methods that might help a site retain its rankings in search engines. How a search engine might react to changes to the URLs of the pages of a site can vary from one search engine to another, and traffic to the pages of a site may be negatively impacted by such a change for a period of time regardless of how carefully such a change is implemented.

ps. Nice introduction to keyword research here: How To Choose Keywords and Variations of Keyword Phrases – SEO Basics (Sorry – no longer available.

Share

46 thoughts on “Do Search Engines Look at Keywords in URLs?”

  1. Your advice about not changing the names of already solidly indexed pages is wise. You could be throwing away years of well indexed pages for something that might well not pay off. I’d rather have a non-keyword rich page name which performs in Google than risking using keyword rich names. I’m even wary of changing page titles and descriptions that have been indexed for long periods of time.

  2. Hi Adam,

    Thank you. Those are great points.

    There’s always an element of risk in putting pages on the Web to support a site or a business, but being aware of the potential risks behind changes is important. I would be hesitant in changing the URLs of pages, or even page titles and descriptions that have been ranking well for an extended period of time, too. Sometimes the risks might be worth taking a chance, but other alternatives should be considered as well.

  3. Doesn’t Google’s search engine already to a pretty good job of this? I hardly use Yahoo as search engine anymore. I have found that Google already does a pretty good job of ‘tokenizing’ keywords in urls.

  4. Pingback: Links for March 26th | jonathan stegall: creative tension
  5. Pingback: How To Choose Keywords and Variations of Keyword Phrases - SEO Basics
  6. Hi People Finder,

    I try to use a variety of search engines, to keep an eye out for the differences that I see.

    Google may do something similar, though it’s not easy to tell.

    I haven’t seen anything directly from Google about looking in URLs for keywords, but they have given us some interesting information about how they try to understand different URLs. My post Solving Different URLs with Similar Text (DUST) is about a paper co-written by Ziv Bar-Yossef, who joined Google around the time that the paper was published.

    The paper is from 2006, and it describes how the search engine might uncover the same content at different URLs. The way that they break apart URLs and parts of URLs covers some of the same ground as in this Yahoo patent application. Don’t know if Google took the next step of pulling keywords from parts of URLs, but it’s possible.

  7. Heay Bill, interesting article for me because I used the same method for parsing urls for a bot to access the optimization of pages. Although Google has not written anything specific about this you can read between the lines in the original PageRank paper that urls are analyzed as part of the contextual analysis of the link. IMO, if you have optimized the site link architecture and page objects then link text a webmaster uses to link to you is less crucial because the optimized URI structure contains the phrase.

  8. Glad to see you added the 301 perma redirect suggestion. Without it, one day in the SERPs, next day gone. Oops. Scary.

    PS, thanks for the link. ;-)

  9. Hi Kimberly,

    Yes, changing the addresses of your pages without using a 301 redirect to let search engines and other visitors know is scary. :)

    You’re welcome on the link. I love your examples. If people start considering including keywords in the URLs to their pages, they should choose good keyword phrases that make a difference.

  10. I had always wondered how search engines might handle the fragment/anchor (#) part of a URL, particularly in internal linking.
    Chances are Google has something a bit more advanced than this; didn’t see any mention of non-ASCII chars. for example. Gives some interesting ideas of how the URL could be used to set crawling priority though.

  11. Hi Terry,

    Interesting that you use a similar method. I think it’s essential to pay close attention to the URL structure of pages on a site to see how search friendly they are, and to avoid having the same pages show up on a site at different URLs.

    A paper co-authored by Lawrence Page from those early days does tell us that Google was probably paying a lot of attention to URLs when crawling pages:

    Efficient Crawling Through URL Ordering

    One of the importance metrics used to determine which pages to crawl is the location of a page, as determined by looking at its place within a URL:

    Location Metric. The IL(P) importance of page P is a function of its location, not of its contents. If URL u leads to P, then IL(P) is a function of u. For example, URLs ending with “.com” may be deemed more useful than URLs with other endings, or URL containing the string “home” may be more of interest than other URLs. Another location metric that is sometimes used considers URLs with fewer slashes more useful than those with more slashes. All these examples are local metrics since they can be evaluated simply by looking at the URL u.

    In The Anatomy of a Large-Scale Hypertextual Web Search Engine, text from URLs themselves isn’t discussed, but the importance of using anchor text and text associated with that anchor text to understand the page being pointed to within a link is. The meaning of “text associated with anchor text” isn’t really defined in the paper, and could include information from the URL itself as easily as from text surrounding a link.

    Since the team from Google was looking closely at the contents and structure of URLs for crawling, and looking just as closely at links for indexing purposes, chances are that in the decade since those papers were published, Google’s use of words within URLs could be playing a role in the indexing of content on pages being pointed towards.

  12. Hi David,

    I was excited to see them mention fragments in this patent filing, too. I don’t believe that I’ve ever seen a link with a fragment in it in a search index result, but that doesn’t mean that the information within those internal page links can’t be used by the search engines to understand the content of a page better.

    I’m not sure that I’ve seen any reference to a search engine paying attention to the words in URLs to determine crawling importance order before, either.

    The use of non ASCII characters in URLs isn’t addressed here, but I’m sure that issues like spaces appearing in URLs happen often enough that it’s something that had to be addressed.

  13. yes , keeping keywords in url are really good for SEO but some times it creates problem too . I have changed old urls of my website few days back but Google has not indexed them till now . I don’t know what could be reason with that

  14. Nice post

    The domaine name is the hard part to change – as it mostly is your business name. The last part of the URL must include keywords if you want to optimize your rankings.

    Some people ting Query arguments can damage the ranking of the page, or even keep it from getting indexed. I know Google don’t want to index search results from your page, but still have to see some hard evidence before I vote 100% against query arguments.

  15. Pingback: » Pandia Search Engine News Wrap-up March 29
  16. Hi Miami web design,

    Did you set up 301 redirects for the old URLs to point to the new URLs for visitors from outside of your pages? That can help search engines find the new locations of pages that you moved, as well as helping visitors who might be coming to your site through the search engines, through links found on other pages, through bookmarks that they might have saved, through old links in emails or newsletters or on printed materials.

    Did you actually change the URLs in links on your site to use the new addresses for pages instead of the old ones?

    When making a change like that, it also isn’t a bad idea to try to find some new links to at least dome of those pages at new addresses, and to try to make changes to links pointing to the old addresses, if appropriate and possible.

    If you’ve created an XML sitemap for the site, did you update it with the new URLs. If you don’t have an HTML sitemap, creating one with the new addresses in it can help as well.

    Even if you do all of those types of things, if you change the URLs for older pages, it may take a while for the search engines to update their indexes. It’s one of the risks of making such changes.

  17. Hi Steen,

    I agree with you on domain names – people should think long and hard about making a change to their domain name, especially if a site has been around for a while and has a good number of links pointing at it. If there is a compelling reason to change, then it might be worth taking the risk of doing so. That kind of reason might include such things as a change in business name or business model or business owner, a complete rebranding of a business, a legal requirement to change, the sudden availability of a much better domain name. Not every domain name is the name of a business – it gets harder and harder to find domain names as people register more and more of them.

    I don’t agree that URLs absolutely need to contain keywords, but I do believe that it is very likely that they are another ranking signal that a search engine may consider when deciding what a page is about, and what query terms it might be relevant for.

    Excessive amounts of query arguments in a URL or directory levels may hinder the indexing of pages. That’s a great topic for another post.

  18. Hi Nick,

    Very good observation. Google has had a way to search for keyword phrases in URLs for years, which means that Google has been indexing keyword phrases in URLs. I don’t think that means that we can’t learn some things from this patent filing, though.

    The Yahoo patent filing leads to a number of questions about keywords in URLs, and provides hints at some possible answers.

    We don’t know for certain whether or not the presence of a keyword in a URL makes it more likely that a page will rank more highly for that keyword.

    Or whether keywords in URLs might be used by a search engine to decide what advertisements to show on a page if the publisher of the page uses advertisements like adsense.

    We also haven’t seen to much directly from the search engines on how they might actually look at the words and symbols in URLs, and extract keywords from them and rank those keywords compared to other keywords in a URL. There have been a few whitepapers from search engines on how URLs might be explored to identify different URLs that show the same page. One of the very early Google whitepapers tells us that the search engine was avoiding indexing pages that included “cgi” within their URLs.

    One part that I found really interesting was that this patent filing also adds the concept that a search engine might focus upon crawling some URLs based upon the keywords that appear in them before it will crawl other pages of a site (rather than just executable files that might appear in a cgi bin).

  19. One of the most informed post on the subject I have seen!

    Although some search engines look at keywords in URLs, I believe the ‘rewards’ of placing keywords in URLs are very small. But hey, all these small SEO tweaks add up.

  20. Thank you Lee (Business Galore Directory),

    I’m not sure how much value there is in placing keywords in URLs either. I think it’s likely that there’s less importance for keywords in URLs than compared to using keywords in page titles, or in headings on pages, or in anchor text of links pointing to those pages, or in the content on pages themselves.

    Given a choice between spending a good number of hours rewriting the URLs of this web site to include keywords or spending the same number of hours writing blog posts, I’d rather write the blog posts. :)

  21. Great post mate … a lot of discussion here too.

    I am of the opinion that you should try to add the keyword to the url, but do not change it if you have an existing site, it is not that important

  22. I think if you have any text associated with your website it should always be SEO friendly. Search engines read everything.

  23. Hi Nick,

    Good point. Many sites rank well in search engines without keywords in their URLs, but it doesn’t hurt to consider using them, and using them wisely.

    I really like developing pages with a nice hierarchical structure to them, with well defined information structures and directory names and file names that are meaningful, for search engines as well as visitors and the designers and developers who create those pages and maintain them.

    That’s something that I really haven’t discussed in this post – how intelligently crafted URLs can bring value to people who use a site regardless of the potential value of keywords in URLs when it comes to search rankings. A URL like the following has meaning to the human eye:

    “http://www.example.com/books/philosophy/kant/critique-of-pure-reason.htm”

    It provides a hint of the structure of a site to human visitors and to search engines, it might be more attractive to potential visitors who might click upon a link to the page when they see the URL as part of the search result for a page when those searchers are looking for that specific book as well as other books by the same author and books within that genre. People might also reverse hack the URL in their browser address bar, to see if it leads to a more specific page about the author, like this:

    “http://www.example.com/books/philosophy/kant/”

    Search engines do read everything, and people sometimes do as well. :)

  24. Thanks, Wollongong

    I share your opinion. If you’re starting a new site, definitely consider including keywords in URLs. If you have an existing site, there may be other things that you can do that would have a greater impact than changing your URLs, with less risk attached to them.

  25. In my expereince Google do look for keywords pertinent with the content being original, since Google have changed their algorythm they now look only for fresh content, so blogger that are doing original content are doing the right path.

  26. Bill, great piece.

    Like it so much I tweeted it.

    Given the searcher engagement impact of URL and clickability of keyword bolding in the SERPs, having keywords in the URL seems like a no-brainer, regardless of whether or not there is proven or direct benefits to textbook SEO.

    Cheers,
    Ken

  27. Hi Ari,

    I think that’s a good point. Fresh and unique content is likely one aim that Google has, especially when it comes to news and possibly to blog posts. For other types of searches, those factors may not be as important. For example, the best pages about the Magna Carta or the U.S. Constitution might not be fresh news or pages, and those documents haven’t changed much over the years.

  28. Hi Ken,

    Thanks for the tweet, and for your kind words. I agree – having a URL that seems relevant in the search results does seem like something that might make it more likely that a person will follow that result to a page. A great title and snippet don’t hurt either. :)

    I’ve found some other interesting research since this post that I’ll probably be writing about in the near future.

  29. Hi Michiel,

    I’ve been uncovering some other potential benefits to including keywords in URLs. Rankings don’t seem to be the only reason why they might be useful

  30. Every one has different view regarding this issue. And i too have a view. For me , having keywords in URL does not have any significance. When you analyze two groups of websites, where one has keywords in URL and the other does not have, the one with keywords in URL will always score less then the others on so many parameters.

  31. Hi cousnseling,

    I’ve seen sites that rank highly without keywords in URLs and other sites that rank highly with keywords in URLs. I’m not sure that comparing different sites is helpful, when its very likely that the rankings for those sites are also affected by many other signals.

  32. In my opinion keywords in urls is a very important factor in the search engines ranking algorithm. However, one has to be really careful before making any changes because a page might already be ranking well. Another reason is that there might be many web sites that already link to a specific page and by changing the URL,all these links will be lost(especially if there is no 301 redirect from the old url to the new one).

    Is such cases it would be wiser to deal with other on-site factors(page copy, page titles,link structure etc) rather than tweak the urls.

  33. Hi Stefanos,

    I agree with you completely – it really does make sense to understand both the benefits and the potential risks of making such a change. We know that there are likely more than 200 ranking signals that Google might use in association a query with a page – and that some signals probably provide more weight than others.

    If someone has a site with around 1,000 pages, without keywords in the URLs, and many links to those pages using the keyword-less URLs, it’s much less of a good idea to go back and change all of those URLs than it might be to improve the content and keywords on the pages themselves, and in links to those pages. That time and effort may also be better spent creating new content, and new pages.

  34. Google definitely doesn’t discourage the use of static URLs with keywords in them. In their SEO starter guide under the subheading “Good practices for URL structure” they state:

    “URLs with words that are relevant to your site’s content and structure are friendlier for visitors navigating your site. Visitors remember them better and might be more willing to link to them.

    Avoid: using excessive keywords…”

    They even mention “…rewriting their dynamic URLs to static ones…”

    Mark

  35. Hi Mark,

    One thing that is a little interesting about the advice in Search Engine Optimization Starter Guide (pdf) is that it doesn’t suggest that those words may be helpful in the rankings of pages. Instead, they tell us that:

    1. Visitors might find URLs with words in them to be friendlier
    2. The crawling of your pages may benefit from having words in URLs
    3. When links to your page use your URL instead of anchor text, the words in the URL might help the search engine better know what the page is about (and presumably would help in ranking by indicating the relevance of the words in your URL to the content on the page).
    4. URLs to pages are displayed in search results, and a URL with words in it may be more attractive to potential visitors.

    Those are definitly some benefits worth considering when deciding whether or not to use words within URLs for your pages.

  36. Yes, search engines look at keywords in URLs. While these KWs are not as powerful as say, the KWs in the title tag, they still are quite important. Google and other search engines see if the URLs work in tandem with KWs in the title tag, meta, H1, on-page terms, incoming links and links from the hosting domain. When all of these match up, the URL KWs give more power to the terms you aim to rank for.

  37. Hi Cameron,

    This is one of those areas where there’s a scarcity of information directly from the search engines, such as in patents or whitepapers or blogposts, about the actual value and weight of keywords in URLs, so it’s hard to get a sense of how much actual value they might have.

    There are plenty of websites that rank well for different terms that don’t include keywords within URLs, but there may be some advantage to including them. At the very least, when you include keywords in your URLs, and someone links to your page using your URL itself, you do get the value of those keywords in the URL as anchor text.

Comments are closed.