How Google Might Ignore Insignificant Terms in Queries

The most important step in doing keyword research is entering a keyword phrase into a search engine like Google, and seeing what results show up, and trying to understand why the pages that appear within results are there. If you can’t do that, then it’s time to dig down and start learning.

Whether you’re a searcher looking for information on the Web, or someone doing keyword research for a website, it’s important to have an idea of the many different ways that a search engine might treat a search you perform. For instance, if your search is one that might trigger Google to show results from a specific web page associated with a named entity (a particular person, place, or thing) at the top of those results, you shouldn’t necessarily be surprised to see that site listed first in search results. This is something that is done algorithmically by Google. Just stating that Google has a “magical” brand preference is a mistake in that instance. It’s better to try to understand how that algorithm might be triggered instead.

Would you eat this mushroom before researching and investigating first whether or not it was safe?

Likewise, when you perform a search for a term such as [hospice], Google might decide to show a map result from Google Maps in Web search results because their universal search algorithm suggests that the query has a local intent, and the searcher is likely looking for a nearby hospice. Again, it would be a mistake to make the assumption that Google is favoring their own “property” in Google Maps when the reality is that the vertical search result of Google Maps is what searchers are actually looking for.

On Relevance and Search Engines

Relevance matters to each of us on a daily basis. It enables us to focus upon the things that are important in our lives. It’s something that each of us learns about everyday, and have been since around the time that we first learned to crawl, but not necessarily consciously.

Relevance and Evidence

I first began purposefully studying relevance a number of years ago, but not to help websites show up in search engines. My introduction to relevance as something I needed to learn, and needed to learn well, came in law school, in classes like Evidence and Criminal and Civil Procedure. In Evidence, we spend the class learning about the rules of evidence. The test for relevance under the Federal Rules of Evidence is:

(a) it has any tendency to make a fact more or less probable than it would be without the evidence; and

(b) the fact is of consequence in determining the action.

How a Search Engine might Weigh the Relevance of Anchor Text Differently

One of the things that’s clear about how search engines work is that when they find a link pointing to a page using certain anchor text, that page might be seen to be a little more relevant for the text found in that link. Google pointed that out in one of the earliest white papers about how the search engine works:

This idea of propagating anchor text to the page it refers to was implemented in the World Wide Web Worm [McBryan 94] especially because it helps search non-text information, and expands the search coverage with fewer downloaded documents. We use anchor propagation mostly because anchor text can help provide better quality results. Using anchor text efficiently is technically difficult because of the large amounts of data which must be processed. In our current crawl of 24 million pages, we had over 259 million anchors which we indexed.

The Anatomy of a Large-Scale Hypertextual Web Search Engine

But one of the assumptions that many make is that each link, with its anchor text, is equally as important as any other link and that if a page has lots of links pointing to it with certain anchor text included in those links that it will rank more highly for the terms found in that text than it otherwise might in the absence of all those links.

How Google May Boost Search Rankings for Your Relevant Pages Using Keywords in the Same Category as Your Website

Imagine that Google assigns categories to every webpage or website that it visits. You can see categories like those for sites in Google’s local search. Now imagine that Google has looked through how frequently certain keywords appear on the pages of those websites, how often those pages rank for certain query terms in search results, and user data associated with those pages.

One of my local supermarkets has a sushi bar, and they may even note that on their website, but the keyword phrase [sushi bar] is more often found upon and associated with documents associated with a category of “Japanese Restaurants” based upon how often that phrase tends to show up on Japanese Restaurant sites, and how frequently Japanese restaurant sites tend to show up in search results for that phrase.

Since Google can make a strong statistical association between the query [sushi bar] and documents that would fall into a category of “Japanese restaurants,” it’s possible that the search engine might boost pages that have been categorized as “Japanese restaurants” in search results on a search for [sushi bar]. My supermarket [sushi bar] page might not get the same boost.

That’s something that a Google patent granted earlier this week tells us.

How a Search Engine Might Weigh Pages with Relevant Annotations Higher in Search Results

One of the words that often appears when someone describes how search engines work is relevance. A search engine attempts to show searchers web pages and other results that might be relevant to the words that they used when they perform a search. Yet, there are a number of different ways that you can define relevance.

For instance, Rutger’s professor Tefko Saracevic, who has been studying the concept of relevance for years, explores different thoughts and literature on the topic to describe a number of ways to define relevance in a 2006 paper on Relevance: A Review of the Literature and a Framework for Thinking on the Notion in Information Science. Part II: Nature and Manifestations of Relevance*.

Relevance could be considered a way of finding documents that contain words someone might search for, or documents that are related to concepts involved in those query terms. Relevance could be determined by looking at a relationship between a searcher and the search terms they use, while considering their past browsing and searching history, and possibly the searches of people who might socially related to them, or who share some common interests with them.

Relevance could also be determined by a problem or task that a searcher is faced with when performing a search.

Search Engines Applying Different Anchor Text Relevance from the Same Site and Related Site Links

Anchor text in a link pointing to a page is often used by search engines to determine what a page being linked to is about, and to determine what words and phrases that page is relevant for.

But, there are a number of issues raised when anchor text is used by search engines in that way. Here are a few of them:

  • If a page points two links to the same destination page using the same anchor text in both links (for example, in the navigation and the footer of the page), should the relevancy of that link text be weighted twice as much as if there were only a single link from the source page?
  • If there is a link on every page of a site to a single page of that site (a site wide link) using the same anchor text, should each of those links accumulate in weight to determine how relevant that page might be for the text used in those links?
  • If there are multiple links on a page to another page, or sitewide links to that other page, and the anchor text is different in each link, should the text in both links carry the same amount of weight in determining what the page being linked to is about?

