Study on the Structure of Search Queries

Would it surprise you if over 40 percent of the queries entered into search boxes at search engines consist of proper nouns, such as the names of specific people or places or things?

Or that combinations of proper nouns and nouns might make up over 70 percent of most searches?

At least those are a couple of the conclusions from researchers at Yahoo who are trying to find effective ways to better understand the structure of search queries used by searchers.

A study of queries entered into Yahoo’s search engine in August of 2006 took a close look at The Linguistic Structure of English Web-Search Queries (pdf), and tried to get an understanding of the way that people phrase what they are looking for when they search.

The researchers behind the study came up with some interesting information about the queries that people use, and the structure of those queries.

For example, when people perform a second search related to a previous search, one of the most common things that they change in their search terms are numbers. Someone looking for information about the first Spiderman movie might type into a search box the phrase “spiderman 1.” Following up, they may then type in “Spiderman 2.”

The type of word most likely to be reformulated is “number.” Examples included changing a year (“most popular baby names 2007″ ! “most popular baby names 2008″), while others included model, version and edition numbers (“harry potter 6″ ! “harry potter 7″) most likely indicating that the user is looking at variants on a theme, or correcting their search need.

The study also looked at the use of capitalization in search queries. While many people search for proper nouns at a search engine, only a small percentage of those searches use capital letters for the first letter of names:

However, the great majority of queries contain no informative capitalization, so the great majority of proper nouns in search queries must be uncapitalized. We cannot, therefore, rely on capitalization to identify proper nouns.

The paper provides some insight into how tagging different words in a query as a proper noun, or noun, or verb, or number, or conjunction (or other parts of speech) might help people working at search engines to understand how people search, and how people reformulate their searches.

When you perform a search at a search engine, how do you formulate your query?

  • Do you ask questions? (where are seafood restaurants in new york)
  • Do you include punctuation in your query? (where are seafood restaurants in new york?)
  • Do you type in a string of words in any order? (seafood new york)
  • Do you use grammatically correct phrases (new york seafood restaurants)
  • Do you capitalize proper nouns? (New York seafood restaurants)
  • Do you include adjectives with your search terms? (best new york seafood restaurants)
  • Are you placing conjunctions in your searches? (new york and seafood restaurants)

If you own a website, you probably want to understand how people use your site, and how you might improve your site to make it more likely that people will return and take advantage of what you have to offer on your site. It’s not surprising that search engines would want to gain more insight into how people search, and how they can provide better results based upon the way that people perform searches.

This paper is interesting in that it provides us with a window into one aspect of how a search engine might be trying to better understand their audience.

While this study might provide people who do keyword research and search engine optimization with some questions to ponder about the keyword research that they do, it should also provoke people who provide information or services or goods on their web sites to question how they might work to better understand their audiences.

If you own a web site, and have a site search on your site, do you look at what people are searching for and how they are searching, to try to understand better what they might be trying to find on your site?

Do you take a look at the query terms that people use to find your site in referrals from search engines in your log files or analytics programs, and question those, and consider whether the people finding your site from the major search engines are getting their informational or transactional needs addressed when they come to your site?

If you receive emails or phone calls about what is offered on your site, do you pay attention to the questions and queries that people might have, and see if your site might (or could) answer some of those questions in a helpful way?

What do people ask for from your site, and how do they do it?

Share

58 thoughts on “Study on the Structure of Search Queries”

  1. Interesting article…

    It is always imperitive for any SEO to try to understand user behavoir & intent when researching search phrases for any given SEO campaign.

    It isn’t always clear and easy, and it isn’t always easy to get a client to understand this behavoir also. However, without understanding this you pretty much defeat the purpose of SEO.

    Thanks for the insight…

  2. This is very interesting, makes me wish my seo knowledge was a bit deeper. But your site is a great resource for someone like me.

  3. Another great article, Bill. Your blog has to be one of the best, non hype, real SEO sites out there.

    I am always amazed when checking keyphrase logs to see the number of people that ask a full question, as if they are asking a real person. Don’t think I have seen any punctuation yet though.

    Thanks for your useful insight as always.

    Rob

  4. nice post. you ask good questions. having a solid seo strategy often makes people consider how visitors are interacting with their site or even with their entire business. keep the good stuff coming.

  5. The questions you provide above inserted into a search query would lead to longtail keyphrases and this would be a very niche area for any given search. This leads to some very interesting research. Very good post!

  6. Wow! … 40% proper nouns and 71% combination of nouns and proper nouns. 40% proper nouns was more than I would have guessed.

    I wonder of the 40% proper nouns, what fraction are local in nature, people / vanity searches, and what percent are “branded” searches.

    Also of the remaining 29% … what are people typing in that aren’t nouns? Looking at my own search history … almost all my queries include some noun. Well then again, almost all my queries have special modifiers in them as well ;-) (e.g. -kwd, site:, +kwd, etc.)

    The data tagging process is also particularly interesting. I’ve tried similar work in the past by attempting to categorize by the larger supersets of Navigational, Information, and Transactional … and even that’s a challenge at times.

    Great article! Thanks for sharing!

  7. Bill,

    Internetnews.com reported on 2/13/09: “Rather than ignore Google, Marchex actually works in conjunction with the search giant’s ad network. Last month it became the first, and for now only, Google ‘AdWords authorized reseller technology platform.'”

    Isn’t Marchex niche domain names – nouns – with “local” adjectives as modifiers? Content and search functionality are now married into a domain name – with location as a “proper” noun. Who owns “the domain name” “newyorkrestaurants.com” vs “the search box” “new york restaurants?” Owned and Operated. Divided, only by “spaces.” Even for small, independent businesses, isn’t Google forcing us to play in its sandbox – oops – search box?

  8. The book mentioned is more than 2 years old. I think search engine queries have changed a lot since then. I’m quite experienced internet user and I usually type keywords in search query.

    I saw that many users, especially younger ones, don’t type URLs into address bar but search it through google. Quite strange but often seen.

  9. When you look on the Google keyword tool “search volume” you’ll see the most of the people search for gerund names.
    Check the term “SEO” how much people search for the term “SEO expert” or “SEO master”? the different is huge

  10. I know from my own searches that I mostly use adjectives, nouns and proper nouns and names – especially when looking for specific people.

    The study is interesting because verbs do seem to take a back seat to the other words in this study as well as in my own searches.

  11. Bill, do you ever think we will see the day where the search engines actually communicate with searchers and educate them on how best to indicate their real intent in the form of a search query? Most people don’t know how to search, thus trying to deduce their true intent may not be the best way to study their behavior. I really like how searchme blends the search functionality with a browse functionality. for example, whe ni type in dolphin, it suggests wildlife, sports, travel, etc, helping me to clearly define my intent. Relevance…the key to search engine success, is really a two way street. just some thoughts.

    ps. im a longtime reader and first time commenter. i really appreciate ur work, and the work of others who really dig into the core of this industry. whether we call it seo science, academic seo, or whatever, it is this type of work that helps contribute to the success of our industry, and the understanding of interaction with the web. for that…i thank you :)

  12. Hi Agent SEO,

    You’re welcome. Intent is a key to trying to understand what people are looking for, and how they might look. One of the difficulties is in assuming that everyone searches the same way that you might. That can be a big roadblock.

    Hi John,

    Thanks.

    Hi Rob,

    It is interesting going through log files and seeing what terms people do find a site for. Doing it on a regular basis can show you how often people might get to your site through search phrases that you may never have anticipated.

    Hi Ryan,

    Yes, a good strategy is essential. As is the ability to redefine your strategy when necessary.

    Hi phaithful ,

    Yes, I found the numbers pretty interesting. Much higher with proper nouns than I would have expected. The paper doesn’t break those down futher, and doing so might pose some difficulties, but I would like to see those numbers as well. Thank you.

    Hi Sylvia,

    One of the things that they were tagging as well were URLs, to see how often those were searched for, and how people were including them in their queries. Hopefully we will see more research along these lines.

    Hi Odzyskiwanie,

    Not sure which book you are referring to, but the Yahoo paper came out in October of 2008, and I didn’t see any discussion of it on the Web. The queries that were examined were from 2006, which could be considered a long time ago on the Web, but it’s hard to tell if the way that people search has changed that much in a couple of years.

    What’s interesting isn’t so much the conclusions about how people search as it is how researchers are working towards finding tools to use to understand how queries are structured, and the implications behind being able to use structure information. If the structure of a query might influence the results that we see from a search engine, then this kind of structure analysis could have some interesting implications…

  13. Hi People Finder,

    I was intriqued by the low numbers of verbs used in queries as well. I’ll be paying more attention to how I structure my queries after reading this paper. :)

  14. Very interesting, thanks for posting that. I have a usually set way that I search for things on the net so there is some really good incite there into how most people search. So that’s some good stuff for keyword research.

  15. When keyphrases are used to bring someone to my site, they type in lower case no punctuation for the most part. I noticed that if the phrase uses a verb, they stay on site longer, searching more pages (just a weird factoid).

    Another strange trend that I noticed when reviewing keywords for one of my sites has been the use of one word in a search on Live. Mainly nouns, but a few times prepositions. This happens between two to three times a day. My site will be on page 10 for example in the SERPs, and a few pages are examined . Maybe someone doing research?

  16. Hi Frank,

    Very interesting observations, especially about searchers who use verbs staying on your site longer. Referrals from Live.com are odd, and I’m not quite sure that they are that accurate. I’ve seen the same kind of thing from them. I wonder sometimes if what we see in those referrals indicates that a searcher may have used a different term than what is reported, and that we are seeing a category for the term they used rather than the actual search keywords.

    Hi Kyle,

    Thanks.

  17. This article was so interesting! The variation across the millions and millions of search queries is so great, yet also so similar across the board so it’s overwhelming to think what is being put out there. But when broken down, studying the structure of search queries can be a powerful tool when optimizing for keywords and ppc campaigns.

  18. Hi Gary,

    Thank you. I agree with you – getting a better understanding of not just what information people are searching for, but also how they are writing their queries is helpful for the search engines, and the owners of sites indexed by those search engines, who might be using ppc or SEO.

    I’m hoping that this paper inspires some additional research along the same or similar lines.

  19. Great article, I learned a little about this through Google Analytics. At first I would try to rank my main keywords. They started ranking but I saw more people search for questions and also search for my keywords in reverse order.

  20. Hi Clark,

    Thanks. Targeting main keywords isn’t a bad idea, but it is possible to get as much or more traffic to your site through longer tail terms or phrases that fit appropriately into the content of your site and may be words that your audience will use to find your pages. In addition to targeting those headterms, or main keyword phrases, it doesn’t hurt to build upon your content to include related phrases and concepts and synonyms. Doing that may make it more likely that you’ve developed some well written and thoughtful content. What I liked about this study was that it not only had us start thinking about using words that our audiences were likely to search for, but also how they might phrase their queries.

  21. Great article, I always start doing long phrases for my SEO Clients because aiming for the main keywords right off the back takes awhile and they want to see results right away. I also found that many people search longer tail terms than their main keywords that were most competitive.

  22. Hi OC Marketing,

    I think that can be a good approach – starting with longer phrases, but keeping opportunities open for the more competitive phrases. It can often be overwhelming diving straight into a competition with sites that have been established for a while.

  23. Pingback: Editor’s Picks: February 16-20, 2009 | Search Marketing Standard
  24. Pingback: SEO Daily Reading - Issue 143 « Internet Marketing Blog
  25. Interesting post. We know that if a PPC ad headline exactly matches a searchers query then click through rates go up. It follows that subtle improvements in site conversion might follow if we make sure that page elements like headings and menu links exactly match the case and spelling of the keywords that drive traffic to the page. For example, it might be better for us to use the menu item ‘ppc services edinburgh’ than ‘PPC Services Edinburgh’.

  26. Hi Eamon,

    I’m not completely convinced that matching a page title or heading with queries will be that helpful in capturing clicks in organic search results. But it’s definitely worth experimenting with. :)

  27. Interesting indeed. Considering that this would encompass just about every single local search – I suppose it does make sense. Searches like “Hicksville Widget Builder” for example, where Hicksville is the proper noun.

    Also, as a service provider or product developer – I think this goes to show the value of developing your own brand.

  28. Hi Danny,

    I know that Google, Yahoo, and Bing have all spent a fair amount of effort in trying to understand when queries might be geographically related even if a location term isn’t included in a query itself, such as in a search for “pizza.” So, there are likely many local searches that don’t always follow some kind of specific structure. But I do think it’s very useful to try to keep in mind how people might be structuring their queries, especially when you might be doing keyword research.

  29. When I look back through my search history the words/phrases that I use are mainly straight to the point. No fluff like good, great. No conjunctions nor adjectives. Does not need to be in any specific order but it does need to be grammatically correct or you will have to rely on google’s spell check.

    Do you capitalize proper nouns? (New York seafood restaurants)

    There is no need to capitalize, letters are letters.

  30. Hi Preston,

    I would guess that my search history might look a little different than a lot of other people’s as well. But, that’s one of the things that I was impressed with, that we get the chance to learn about the searching habits and approaches of a lot of people.

    I don’t think many people do bother to capitalize many of the words in their queries, and the study noted that as well – as I quoted from the paper:

    However, the great majority of queries contain no informative capitalization, so the great majority of proper nouns in search queries must be uncapitalized. We cannot, therefore, rely on capitalization to identify proper nouns.

  31. I’ve always built links with capital letters…never thought it mattered. Do you know if Google provides different search results for the same words when starting with capital letters vs. lowercase. Such as “Spiderman 2″ & “spiderman 2″? I’ve run a few tests and don’t see a difference.

    I know this isn’t the point of your post, but I’m looking at it from a link building perspective.

  32. Hi Sam,

    People are pretty inconsistent in their use of capitalization in queries. As the study noted:

    However, the great majority of queries contain no informative capitalization, so the great majority of proper nouns in search queries must be uncapitalized. We cannot, therefore, rely on capitalization to identify proper nouns.

  33. Interesting post.
    On my side, 90% of the time people use lower case keyword. Also the keyword with 2 words are mostly used.
    I also found that people who used more than 4 words visits few pages.

  34. Hi Luc,

    I’m not sure that I’m very surprised that when people type queries into a search box that they rarely capitalize proper nouns. They are, afterall searching, and not corresponding or communicating with someone.

    I wonder if most people who use longer queries tend to find the information that they are looking for quicker, or assume that if there isn’t an adequate response to a longer query that the information they are seeking doesn’t exist on your pages.

  35. One thing that they didn’t discuss was that 25% of searches are truly unique in the keywords according to Marissa Mayer of Google. Also when trying to optimize for a single phrase it tends to be nearly impossible as everybody searches or thinks to type in a truly unique phrase. For example some people still think of Google as Ask Jeeves where they phrase an extensive question stuffed with keywords such as: Whats the best place to eat lobster in San Diego, California? Where the next related search may just be “Best Lobster San Diego”

    Notice the huge difference but search is getting so intelligent now that when locally you will receive a map with ratings, numerous dining review websites, and a lobster restaurant or two. Also using proper nouns for a name of a city shouldn’t affect the search as opposed to searching for common names with and without proper capitalization.

  36. I study keyword stats all the time, to see what should be ranked for next, and it amazes me how people get to the same website by so many different phrases. Some search in the form a question, others do not spell one or more words, and they still get there. Example: spring, tx – spring texas – sprangtx – city of spring texas – where is springtx? etc. As long as they find my clients I am happy, and so are they.

  37. Hi David,

    This study was from Yahoo, which is probably why they didn’t use Google statistics and why I didn’t include that Google stat myself. Marissa Mayer may have used that statistic in a public speech, but I think it was originally attributed Udi Manber, Google’s VP of Engineering, who said that 20% to 25% of the queries that they see at Google on a regular basis are unique.

    People search the way that they want to, and some will type out full questions rather than a string of keywords. I agree – the search engines are striving to guess the situational intent behind searches, such as whether or not to include maps or reviews in search results.

  38. Hi Tulum,

    People do get to the same page by many different ways, but I don’t think that it should come as a surprise that search engines are concerned about how people use their service, and are trying to get a better grasp at the different strategies and approaches that searchers use to try to find information.

  39. I think the search behavior of people is very interesting and is constantly evolving. I have noticed over time that search users are becoming more savvy in the way they look for things online. The long tail search queries for some of my clients websites are up considerably over the past couple years, but it really depends on the types of products or services they offer. Very good post!

  40. Hi Scott,

    I agree with you. I think that the ways that people search have been evolving, and their expectations of search engines have been changing as well. There have been some interesting studies looking at query logs from different search engines to explore things like the average length of queries, and so on. Would love to see the research conducted in the experiment above performed again in the future and compared to the original results.

  41. It is imperative to understand how web surfers use search queries online among other behaviours like navigation, on page conversion patterns etc… At the same time google is sooooo much more when you use a long query notably because marketers have gotten so good at predicting what users are searching for specially for searches related to products and services. Great Post!

  42. Hi Moe,

    The search engines are spending as much time studying how people perform queries as anyone else. I think that drives some of the changes we see at places like Google.

  43. Great article. A lot of people don’t understand how people search for items around your keyword. Doing the research first on what keywords people use first before planning your attack will save you a lot of time and aggrevation. If you know a lot of nouns are being used, figure out what the means for your specific keyword and really go after it with original content and backlinking.

  44. Hi Sam,

    Thank you. Keyword research and understanding how people might search to find what you offer is definitely one of the essentials behind SEO, and one of the keys to success. It’s not always easy, but it’s worth doing and doing right.

  45. It’s best to research which keywords are searched frequently but don’t have much traffic and then try to rank for those keywords. After ranking and getting some good traffic go after the more competitive keywords.

  46. Hi Clark,

    Good points. As long as those terms are appropriate for the content of your pages, it can be a lot easier to rank for them then for terms where you are taking on sites that have been online and established for a while, and that you may have considerable problems competiting against (at least initially).

  47. Good info Bill. One question though, you mention in the article about capitalization and non-capitalization of words. It has been my understanding that search results are not affected by whether the searcher capitalized the word or not. Could you give me your feedback on that point?

  48. Hi Mike,

    The paper I wrote about in this post is a study from Yahoo researchers on how people formulate their queries, and it looked at a number of different factors, including whether or not people capitalized proper nouns and other words in queries.

    In one of the quote boxes in my post, I included their conclusion when it came to capitalization, which was that people seemed to rarely include capitalization in their queries. Based upon that conclusion, it probably wouldn’t be a wise thing for search engines to consider capitalization when they ranked search results.

    I’ve tried a good number of queries personally using capitalization and not using capitalization to see if I would get different results in different search engines, and so far I haven’t seen any differences in the results returned to me.

  49. If I am searching locally, I usually start with the City + Keyword. EX: “Laguna Beach Italian Food”. But ever since Siri came out for the iPhone, people are saying “Where’s an Italian restaurant in Laguna Beach?” and now I see people find my site and are searching with the keywords first: “Italian Restaurant in Laguna Beach”. I guess my point is, it’s always best to mix up your phrases and combinations to maximize the visitors.

  50. Hi Clark,

    Definitely a good point. The way that people search does change and evolve over time. It’s interesting that the natural language queries that Siri responds to is probably driving the change that you’ve pointed out. It’s probably good to keep an eye on your analytics for those types of changes as well.

  51. If you accept the division of natural languages in isolating,agglutinative and inflective, then all Internet search languages are at the isolating level.
    Hence the need to improve them, starting by merely creating lists of parts of speech and lists of grammatical affixes, followed by lists of their semantical values in grammatical constructions.
    Otherwise,we will stop thinking european and use Gbe (http://en.wikipedia.org/wiki/Gbe_languages) to google.

Comments are closed.