Relevant Annotations
One of the words that often appears when someone describes how search engines work is relevance. A search engine attempts to show searchers web pages and other results that might be relevant to the words that they used when they perform a search. Yet, there are a number of different ways that you can define relevance.
For instance, Rutger’s professor Tefko Saracevic, who has been studying the concept of relevance for years, explores different thoughts and literature on the topic to describe a number of ways to define relevance in a 2006 paper on Relevance: A Review of the Literature and a Framework for Thinking on the Notion in Information Science. Part II: Nature and Manifestations of Relevance*.
Relevance could be considered a way of finding documents that contain words someone might search for, or documents that are related to concepts involved in those query terms. Relevance could be determined by looking at a relationship between a searcher and the search terms they use while considering their past browsing and searching history, and possibly the searches of people who might socially be related to them, or who share some common interests with them.
Relevance could also be determined by a problem or task that a searcher is faced with when performing a search.
Search engines have been exploring some of these different concepts of relevance as well, and a recently granted patent from Google redefines the way that we might perform searches to help searchers find relevant pages when they are faced with informational needs or tasks that they want to fulfill.
Under the process described in the patent, in addition to using a query term in our search, we would also include a label that might match annotations made on pages that could be returned in search results.
For example, someone searching for information about digital cameras might want to see professional reviews about cameras. They might enter a query at Google that would look like this:
digital cameras label:professional reviews
The search results that they would see would show the pages that are relevant for the query term “digital cameras,” and would weigh pages that are labeled “professional reviews” as more relevant to the search than pages that don’t have labels attached to them.
The image below from the patent shows pages in search results that have a “symptoms” label associated with them on a search for cancer that includes a “labels: symptoms” search operator:
Annotations as Labels
Personalized search often looks at past browsing and searching history to try to identify pages that might be “relevant” to a searcher’s intent by attempting to understand the interests of a searcher. But that information may not be very helpful when someone is attempting to find information relevant to a task at hand, that has nothing to do with pages they viewed in the past.
A search engine will also sometimes show query suggestions to a searcher based upon pages that other searchers ended up visiting when they entered the same or similar terms into a search box. But it’s possible that those searchers had very different intentions behind their searches.
If a searcher were to add more information about what they were looking for, such as the labels mentioned above, it might help a search engine find more relevant results based upon the situation behind a search.
But how does a search engine create those labels, and associate them with web pages?
A web site focusing specifically on health issues might include tags or categories for articles published on the site. For instance, articles about allergies might be tagged with terms such as “symptoms,” or “treatment,” or “medications.” A web site about digital cameras may also annotate specific pages with tags such as “expert review,” or “new product.”
The tags may be helpful on those sites, but you don’t see the annotations when you perform a search on a general search engine such as Google or Yahoo or Bing. Annotations might also be identified from comments made on a page, as well.
If a search engine were to capture information such as the tags on sites like those, it might be a start, but there are many pages that don’t have explicit annotations on them, and that might not be labeled, even though they would possibly be helpful to searchers.
A search engine might attempt to find other ways to understand how annotations might apply to specific pages, such as looking at the information found within patterns on the URLs of pages. For instance, a web site about digital cameras might have a directory named reviews, such as “www.digitalcameraexample.com/review/.” An assumption might be made that the documents contained in that directory contain reviews of digital cameras, and a label or “professional reviews” might be applied to pages within that directory.
Another directory on that site might be “news,” as in “www.digitalcameraexample.com/news/.” Pages within that directory wouldn’t have a “professional reviews” label attached to them, but an “industry news” label might be instead.
The Google patent is:
Filtering search results using annotations
Invented by Patrick F. Riley and Ramanathan Guha
Assigned to Google
US Patent 7,668,812
Granted February 23, 2010
Filed May 9, 2006
Abstract
A search engine system accepts queries that include query terms and labels applicable to certain documents. A domain filter is constructed that is used to filter search results to certain domains, where the domains are determined based on the labels included in the query. The filtered search results are processed to ensure that certain portions of the results are from domains included in the filter. The results are further processed to include the query labels with certain ones of the results.
Conclusion
The idea of being able to add “labels” to query terms is interesting, but I wonder how many searchers would add labels to their searches.
Google does allow you to use other special search operators when you search. For example, if you want to find pages on a topic that are only from educational sites, you can perform a search like the followning:
red dwarf stars site:.edu
There is a lot of value in being able to go to Google and perform a search like:
chicken pox label:symptoms
I would like to see a “label” search operator added as an option.
I do believe that being able to use labels like these would make it easier to find pages that are relevant for a particular situation.
I could see the possibility that people might intentionally apply tags to some pages that aren’t appropriate, or place pages within directories that aren’t good matches for the words found within the URLs, and the patent filing doesn’t go into detail on how those possibly irrelevant “annotations” might be identified, but I would assume that there would be some way to filter those results out.
Note that Google does provide a way to include labels in Google Custom Search already.
Great find.
I’m on the same pace.
Tags are also a great way to put some basic semantic into the sauce. Tags compiled into data groups can give some insight on relevance.
Amazing information Bill! I have been observing quite from an year, links with related tags are getting indexed well and are ranking high on SERP’s. As you said in the last para, search engines might have filters even to identify those spam tags/labels, I believe however most of the people might not be aware of focusing on labels or even on their URL structure! and this might be the reason still search engines weigh pages with those!!
Good Post!
But I have a concern about redundancy of such feature: I believe it is safe to say that currently the crawler creates such ‘labels’ dynamically.
For example: each time the user enters keywords ‘digital camera’ – we can say that the crawler treats it as ‘label’ or ‘tag’, dynamically marks the pages with the label and then retrieves the result – pages marked with label ‘digital camera’.
Would love to hear you opinion!
Like you, I am also wondering how many among the searchers would add labels to their searches. It is somehow a bit time consuming and an added task. But, I think they will if they would know about this and if they would also know it’s relevance. I myself just knew about this when I read your post! You are always adding to my knowledge. Thanks!
Hi Australia SEO,
Thanks for sharing your observations. It was interesting to see that Google might look at directory structures in URLs to attempt to derive “labels” for pages. It is definitely something to consider when setting up the URLs for a site, though I’m not sure that I would recommend that a site that has been around for a while, and is doing well in search results go out and change all of the URLs for their pages. Tags could be easier to implement for many sites, and if they have a positive impact could be helpful.
As I mentioned at the end of the post, it is possible to tag pages when you set up a custom Google search. Would Google use those custom tags in understanding what labels might be attached to pages, when ranking those pages in it’s web search. It’s a possibility.
Ramanathan Guha, who is listed as one of the inventors on this patent filing has been one of the driving forces behind Google’s custom search, and his name appears on a number of other Google patent filings that focus upon that custom search.
Hi LaurentB,
Thank you. I’ve seen a lot of whitepapers from a lot of sources about tags and how they could be used to help understand the web from a semantic stance. They cover a lot of territory, from the kinds of tags used on bookmarking and photo sites, to the differences between tags left by users and automatically created tags, such as the type of camera that a picture might have been taken with. Geo tagging is also very interesting – especially tags of images associated with photos.
It’s going to be interesting to see where Google ends up going with this, and if they do give us the chance to use a “label” search operator like described in this patent filing.
Hi Dmitry,
I understand your concern, and it was something that bothered me somewhat as well.
This patent isn’t saying that pages that show up for certain query terms are tagged with those query terms, but rather sites that tag their own pages, or may be annotated in some other way by others through something like comments, are labeled. And pages that aren’t tagged in that manner, but that have URLs that might indicate a certain label is appropriate could also be included.
Hi Andrew,
Thanks. I would really love to see some data from one of the major search engines that tells us how many people do searches with some of the different specialized search operators. Would people use labels in a search? I would guess that some people would, but I don’t know.
If Google does add this feature, how would they go about educating people that they could use labels?
I, too, wonder how many people will take the time or even know about doing a search with labels. I haven’t seen statistics but I suspect very few people actually used Advanced Search. I think Google should make some of the advanced search features more prominent. I guess it would be wise however to prepare for more sophisticated searches, especially with a blog to avoid having to update a ton of content.
Hi Bill,
That makes total sense about enabling labels and probably something that is bound to be introduced at some stage. I guess it would take some training to get searchers to add labels during their search process. As a whole, I think it would make searching for specifics far more accurate and efficient for the searcher. I’m learning a lot from you each article I read. Thank you.
Dave
Hi Chris,
I remember back in the days when Altavista was the number 1 search engine, before Google. I’m not sure I ever used the regular search there. I always went to the advanced search page. With Google, I rarely use advanced search. I think they should consider making some of the advanced search options more visible, too.
Hi Dave,
From what I understand, one of the reasons that Google introduced Universal Search is because people didn’t click on the tabs too much for the other kinds of searches that they offered, such as image search or news search, even though there were likely times when those were choices that might have been really good ones for the people doing the searches.
While I mentioned in my last comment, that I would like to see Google make some of the advanced search operators more visible, I’m wondering how many people would use labels. If Google introduces the option, it will be interesting to see. It’s possible that some of the most popular ones might be offered in a way similar to search query suggestions that we sometimes see in search results.
Nice post Bill. It’s always interesting to read through these patents and push out some ideas on other factors engines could/do use to rank. I’d bet less than 1% of all queries include a search operator option like label:reviews or site:.edu. As these numbers grow and searchers become more savvy as a whole, it will be interesting to see this and other factors influencing results.
Hi Adam,
Thank you. I suspect that the percentage of people who use advanced search operators as search tools (rather than to do something like check information about their own sites, or competitors sites, or to do keyword research), is rather small too.
I’m not sure if many searchers (who know about them) are really all that excited about doing an allinanchor or allintitle or allinurl type search to help them find information, or will start doing so in the future. If Google introduces the “label” search operator, will more and more people start using it? I don’t know if they will. I would think that Google would have to take some steps to help educate their users on how it could be helpful.
I think the question of how many people would actually use this type of search operator is important. How many people who haven’t work in SEO or a web related field are aware that google can be used in this way? A very small percentage I would suggest.
Hi Paolo,
Those are good points. Instead of people using a “label” type search operator, the search engine might instead offer those labels at the tops of search results based upon annotations made by others, categories determined by the search engine, and in other ways a well.
For example, you can use a “define” search operator to get a definition of something, such as “define:entropy”, but if your query is “what is entropy” (without the quotation marks), Google shows you a definition at the top of the search results. Likewise, the people at Google may try to find a way to offer labels to searchers without requiring them to use the “label” search operator.
Consumer search operators! Yes probably the way forward Bill. Multiple options other than ‘search’ would be welcome I’d expect and define is an interesting one.
Hi Paolo,
I think so as well – instead of additional searches, we might start seeing some options to refine results in intelligent ways.
It’s a good idea, but as stated by others I think not many ‘casual’ searches would use this feature only SEO people and marketers. The problem with the search engines (especially Google) adding these labels at the top of the search results is that the results may begin to look ‘messy’ with the additional options. Google is as big as it is to a large part due to the fact thet the UID (user interface design) is so simple and easy to use.
Hi Richard,
It is a challenge that the search engines do face – trying to keep their interfaces as streamlined as possible to avoid clutter, confusion, and complexity. But they also seem to want to expand in a way that they can offer assistance and help to searchers when it seems like that help might be need. That can come in the form of query suggestions that they might include with search results.
It’s more likely that someone would use a query suggestion rather than an advanced search operator like a “label:” in front of an additional word in a query. As an observer, it’s interesting to watch and see what Google might do to make it easier for searchers to find what they are looking for while simultaneously avoiding “messy” search interfaces.
It’s very interesting but i am quite skeptical
I don’t think people would add labels to their searches
People just want fast results
Hi Alex,
Google does presently allow people using Google’s custom search on their sites to apply labels to certain search results, but I’m guessing that you are right that most searchers won’t use labels with their queries.
But that doesn’t mean that Google might not start showing query suggestions or refinements with labels to searchers, if that means that they can get both fast results and avenues to explore on certain topics that they might not otherwise have considered searching for but might in interested in exploring.
I agree, people want fast results, they don’t have time for labels.
Hi Richard,
Chances are that if Google uses something like the process in this patent, instead of requiring people to type in a special “label” search operator, they would more likely present some or all of the labels as query refinements or “related searches.”
If you perform a search, and the first few pages in the results don’t seem very relevant for what you want to find, but one of the query refinements at the top of the page sounds more like what you are looking for, that may result in finding results faster.
Slightly off topic but I have found that using the ‘allintitle’ operator stops working after a few searches and google views this as spam. Have you come across any ways round this as it’s a useful tool.
Hi James,
It is a useful tool. The only way I’ve managed to work around that limitation is to use multiple browsers, and cycle through them – IE, Firefiox, Chrome, Netscape, Safari, Flock, Opera, Orca Browser, Crazy Browser, etc. If you switch browsers, Google lets you perform a number of new searches before telling you that you appear to be running an automated program to query the search engine. If you switch enough times, and cycle through a number of other browsers, you’re pretty close to being able to conduct new searches on the first browser in the chain.
I seem to have also noticed this spam blocker more regularly when using the +.co.uk ending to search for links in google, which seems to find more references than the link: operator. Anyone got other ways to check back links in google specifically? I know the yahoo site explorer is useful but what’s the more accurate?
Hi Paul,
Thanks for sharing that insight. It’s interesting if Google is policing country code specific tlds more strictly than non-country code tlds.
Of course, using Google Webmaster Tools can be a lot more effective in finding backlinks, but only if the site involved is one that you have control over, and have verified.
Using Yahoo Site Explorer can help reveal a lot more links that Google’s link operator, but I wouldn’t rely upon it to get a sense of how many links Google knows about, or considers when ranking pages – so, yes it’s helpful for discovering links, but it’s not helpful for understanding what Google is doing when it comes to link analysis and pagerank assignments.
Hi Paul, There is a technique that works well for me using Open Site Explorer and Google Custom Search. You can find complete process here:
http://www.seomoz.org/blog/replace-yahoo-linkdomain-with-google-custom-search-engine
Thanks Charles, useful stuff.
I’ve just been through this in full, I’m embarassed to say I hadn’t come across the google custom search engine! This looks like a great tool for backlink research.
No problems, good to know it helps. I don’t use it every time cause it’s time consuming but it worth it on bigger project.
You can also use data from yahoo since open site explorer is slower to index links, it requires some filtering in Excel though…
Thanks Charles, have used that tool you suggested and it worked a charm.