Redefining Navigational Queries to Find Perfect Sites

A number of search engine researchers look at queries that searchers type into a search box, and break them down into three kinds of queries based upon the intent of those searchers – navigational, informational, and transactional. Navigational queries have been seen as searches where someone searching intended to find a specific known site.

Imagine instead considering a navigational query to be one where a perfect site exists that is an ideal one for a search engine to show to a searcher in response to that query, regardless of whether they knew about the site or not. A search engine might put that perfect site at the top of search results, and not worry too much about other results shown.

When is a query a navigational query, and when might a site be considered a perfect site for that query?

A recent patent application from Yahoo transforms the meaning of what a navigation query is, and finds a way to automate the process of determining whether a query is navigational, and whether a perfect page does exist for that query.

A couple of papers that discuss the different types of queries, including navigational ones, are:

A taxonomy of web search (pdf)
Determining the User Intent of Web Search Engine Queries (pdf)

Traditionally, a navigational query has been seen as one where people do know which site they want to visit, but don’t know the exact URL of the site, or may not want to type in the full URL of the site into a browser address bar. Research has shown that around 18% of search queries are navigational.

If a search engine can determine that a query is navigational, it can just show a result including the page that the searcher is looking for, without being too concerned about which other sites are listed after that top result.

A number of machine learning and query classification approaches might allow a search engine to determine if a query is navigational in real time.

Techniques for navigational query identification
Invented by Yumao Lu, Fuchun Peng, Xin Li, and Nawaaz Ahmed
US Patent Application 20080059508
Published March 6, 2008
Filed: August 30, 2006

Abstract

To accurately classify a query as navigational, thousands of available features are explored, extracted from major commercial search engine results, user Web search click data, query log, and the whole Web’s relational content.

To obtain the most useful features for navigational query identification, a three level system is used which integrates feature generation, feature integration, and feature selection in a pipeline.

Because feature selection plays a key role in classification methodologies, the best feature selection method is coupled with the best classification approach to achieve the best performance for identifying navigational queries.

According to one embodiment, linear Support Vector Machine (SVM) is used to rank features and the top ranked features are fed into a Stochastic Gradient Boosting Tree (SGBT) classification method for identifying whether or not a particular query is a navigational query.

Navigational Queries

You could break down web search queries into two categories: navigational and informational. An informational query may result in a list of a number of sites that would be of interest to a searcher in response to the query that they used.

A well known definition of a navigational query is one where a searcher already has a Web site in mind and the purpose behind the query is to reach that particular site. I’ll type espn into a toolbar search box to save myself from typing an additional “.com” when going to the espn web site.

It’s not always easy to determine if a query is navigational in nature.

Redefining what a navigational query is might make that determination easier. Instead, consider a query to be navigational if it has one and only one “perfect” site that shows up in a search result set in response to the query.

A site might be considered “perfect” if the site contains complete information about the query and lacks nothing essential.

So, a navigational query is one where there is a “corresponding result page that conveys perfectness, uniqueness, and authority.”

This approach doesn’t require a searcher to actually have a specific site in mind.

Some Examples of Navigational Queries from the Patent Application

Someone searches for the query “Fulton, N.Y.” They might not know about the site “www.fultoncountyny.org,” but it contains a unique authority and perfect content for that query. That query might be labeled a navigational query.

A query of “national earth science teachers association” has only one perfect corresponding URL, “http://www.nestanet.org/”, and could be labeled as a navigational query.

In contrast, a search for the query “Canadian gold maple leaf” provides a number of very good URLs showing in search results, including “http://www.goldfingercoin.com/catalog-gold/canadian-maple-leaf.htm”, “http://coins.about.com/library/weekly/aa091802a.htm”, and “http://www.onlygold.com/Coins/ CanadianMapleLeafsFullScreen.asp”. It would be labeled as a non-navigational, or informational, query.

Result Set Based Navigation Query Identification System

To automatically identify navigational queries,

A query is received by the search engine, and a set of URLs for pages is returned.

The query and URLs are sent to a multi-level feature extraction system which looks for a number of features such as:

a) The number of times that query terms appear alone and/or together and where in the page and/or URL;
b) The number of times that query terms appear alone and/or together in inbound and/or outbound anchor text;
c) Click-through rates;
d) Session information (e.g., average time on page);
e) The location of links in the page; and,
f) Many other features.

Resources used to generate these features include:

A click engine – recording and analyzing user click behavior. The click engine might create hundreds of features automatically based on user click-through distributions. For example one click feature might be a click ratio, which is the ratio of the number of clicks on a particular URL for a specific query compared to the total number of clicks for the query.

A Web map – stores hundreds of features such as page content, anchor text, and the hyperlink structure of Web pages, including the inbound URLs, outbound URLs, etc. These web map features determine how a URL may be related to other pages within a site through terms in the query itself. An example of an anchor text feature might be the actual visible text in links pointed to a page. If links to a page use the same anchor text, it’s an indication that the page is about whatever is contained in the anchor text.

Query logs – provide features that are based on a set of words and various language model based features from all the queries issued by users over a period of time.

Features from these sources may be selected for a machine learning program to train a classification model that can determine whether or not the query is a navigational query.

Conclusion

The patent application goes into detail on some of the different types of machine learning and classifications processes that could be used when consider how many features exist when comparing a query with one of the URLs that show up as one of the top (perhaps of the top 100) results for a query.

So, this process is one that reranks search results based upon whether there is one site that seems to be a perfect match for a specific query.

Do the other search engines do something like this too?

While reading through the patent application, I thought about how Google shows site links for some search results. I described a little of which pages Google might decide to show in site links in the post, Google’s Listings of Internal Site Links for Top Search Results, though one of the questions I had there was why Google decided to show site links for some sites, and not for others.

Considering that those site links are supposed to help searchers to more easily find pages on site that could be considered final destination pages on a site. That idea appears to indicate that Google may be considering that top result to be a page in response to a navigational query, where destination pages on the site are easier to navigate to within the site links.

It doesn’t appear to be a “perfect site” approach, but there does seem to be some process involved that determines that the query involved is one that should trigger site links – in other words, a navigational query.

Share

19 thoughts on “Redefining Navigational Queries to Find Perfect Sites”

  1. d) Session information (e.g., average time on page);

    That strikes me as interesting… this could be derived from backclick to SERP, but I wonder where else they could derive this? Can they use toolbar data for this?

  2. Hi Richard,

    A back click to the search results might be one place to get the information. I would guess that it’s a possibility that moving to a new page, with the toolbar recording that action (or ISP logs) could possibly be another.

  3. Thank you for writing about this, Bill.

    It certainly seems that becoming classified as a perfect site for a navigational query would be a wonderful achievement.

    It’s very interesting to see that Google’s approach seems different. Yahoo’s is a very useful, logical method.

    Miriam

  4. I came across this in Search Cap and to be honest had never thought of navigation in the manner. I simply thought of searches in terms of research and purchasing. I look forward to learning more about this subject.

  5. I came across this in Search Cap and to be honest had never thought of navigation in the manner. I simply thought of searches in terms of research and purchasing. I look forward to learning more about this subject.

    Me too, I’m interested how much this will evolve and help my website :).

  6. I am surprised to know that 18% of the search queries are navigational. Come to think of it, I do it myself quite a lot of times even when I am pretty much sure of the website URL. I do it to avoid mistyping the URL or the ensure that I am not typing a .com when the site is a .net or .info. Obviously I do not want to land myself into a typo site filled with ads..

  7. Hi Miriam,

    Thank you.

    Becoming seen as a perfect site in response to a query does seem to be an ideal situation to be in, especially if the query term is a unique brand name.

    I do like the Yahoo approach described here. I’m not sure that we see as much of what Google is doing, but their use of site links shows that they seem to be relating some queries as navigational when it comes to specific sites.

    Hi Scott,

    Thanks for stopping by. I think that viewing queries as informational, navigation, and transactional can transform the way that you think about how people search, and what intents with which they may arrive at a web site.

    If a search engine tries to understand those intents, and can do so by thinking about navigational queries somewhat differently, it may help make it easier for them to deliver people to pages that those people actually are trying to find.

    Hi nds roms

    Thanks. I’ll be looking forward to more information about this approach if I see anything. I’d be interested in any experiences you might have with navigational queries and search results.

    Hi Thomas,

    Thanks. Those are some good reasons why people might perform navigations queries that I didn’t mention, and the patent application didn’t include.

    I’d be interested in hearing from others who might read this post why they perform navigational queries.

  8. Do you have a reference for this statement?

    Research has shown that around 18% of search queries are navigational.

    I’m doing some research and this reference would be useful.

  9. Hi Jack,

    The percentage is from the Yahoo patent application:

    [0002]Web search has become a very popular method for seeking information. Users may have a variety of intents while performing a search of the Web. For example, some users may already have in mind the site they want to visit when they enter a query, however, the users may not know the URL of the site or may not want to type in the full URL, and may rely on the search engine to present a link to the site they know they want to visit. By contrast, other users may have no idea of what sites to visit before seeing the search results, where the information these users are seeking typically exists on more than one page. According to research, approximately 18% of queries in Web search are navigational queries, i.e., queries reflecting the situation when the user already has in intended site in mind.* Therefore, correctly identifying navigational queries has a great potential to improve search performance. However, navigational query identification is not trivial due to a lack of sufficient information in Web queries, which are normally quite brief.

    * emphasis is mine.

    You will see other percentages in other places. For instance, Dan Russell of Google was reporting a 15% rate in a presentation at Stanford in November 2006, that I wrote about at Search Engine Land in Why Do People Google Google? Understanding User Data to Measure Searcher Intent

    I believe that at least one of the two papers I linked to at the start of this post also include a somewhat different number.

    There are different ways to measure what a navigational query is, and the researchers were likely looking at different data sets when making that determination.

    Good luck in your research.

  10. A new study from Penn State breaks it down as 80% of searches are information and 10% are navigational with another 10% sited as transactional. The report is said to be 76% accurate so these figures could obviously change.

  11. Hi Scott,

    The research from Penn State isn’t all that new. See my link in the post above, which points to a poster from the study you are writing about, which was presented at the WWW 2007, held May 8–12, 2007, in Banff, Alberta, Canada, (Determining the User Intent of Web Search Engine Queries). I wrote about this study last May (2007) in User Intent and Characteristics of Search Queries

    The data used from that study was taken from Dogpile in 2005. I’m a little wary of query data from Dogpile to be honest, who uses Dogpile?

    A longer version of that study is here:

    Determining the informational, navigational, and transactional intent of Web queries (pdf)

  12. This study was referenced on Search Engine Land and the link they have said the study will be appearing in a May periodical. Why would they use something so old. Isn’t is time they updated their paper?

  13. Hi Scott,

    The study was originally presented in 2007 as a short poster, and is being presented at the same conference a year later now as a more detailed research paper – even though the longer paper was available online last year.

    It’s possible that the authors might have done some additional research for the May publication, but I’m not sure that they have.

    While the article is almost a year old, it is worth a look.

  14. Ask have released its top 10 list for web search terms 2008 and half of them are navigational (e.g. ‘Google’)

    Yahoo have released a filtered list with no navigational queries… but IMHO, according to this “funny” move it should be the same as Ask and therefore very effective.

    I think it is yet another internet conspiracy but you will be the judge :)

Comments are closed.