On Relevance and Search Engines
Relevance matters to each of us on a daily basis. It enables us to focus upon the things that are important in our lives. It’s something that each of us learns about everyday, and have been since around the time that we first learned to crawl, but not necessarily consciously.
Relevance and Evidence
I first began purposefully studying relevance a number of years ago, but not to help websites show up in search engines. My introduction to relevance as something I needed to learn, and needed to learn well, came in law school, in classes like Evidence and Criminal and Civil Procedure. In Evidence, we spend the class learning about the rules of evidence. The test for relevance under the Federal Rules of Evidence is:
(a) it has any tendency to make a fact more or less probable than it would be without the evidence; and
(b) the fact is of consequence in determining the action.
There are actually a number of rules involving whether or not evidence can be admitted in a courtroom in a criminal or civil proceeding.
One of those, for instance, is the hearsay rule. While you want proof of something to be admitted, you want it to be reliable evidence. Because of that, there’s a strong preference towards witnesses sharing their actual experiences when they testify to the truth of something. If I was to be a witness in a case, I would testify about my own experiences to prove something. I wouldn’t testify to someone else’s experience. I couldn’t take the witness stand, be sworn in, and then tell the judge and jury the words of someone else as proof of what they saw. If I were to testify that Joe said that Sam stabbed Edward, the Court would want to know why Joe wasn’t on the stand instead of me. The opposing lawyer should be objecting to my testimony under the hearsay rule.
There are a large number of exceptions to the rule against hearsay as evidence. For example, one of those covers a dying declaration from someone. If they passed away, they aren’t available to testify in person. If they knew they were dying, they were also assumed to understand the seriousness of their statements.
We had a special guest speaker in my evidence class who was an advocate in a number of high profile legal battles, and had also worked as a prosecutor in a good number of cases. Before the class began, he passed out some copies of his law school grades. To say that he wasn’t a very good student would be an understatement. But he told us why he was extremely effective as a lawyer. He knew the rules of evidence inside out and backwards. He not only carried a copy of the Rules of Evidence around with him all the time, but he also kept extra copies of it in his office, in his car, in his kitchen, and even in his bathroom. He practiced his closing statements, and his objections, in his underwear in front of a mirror at night before he went to sleep.
Search engines are one of the primary tools that most of us use to learn about the world around us. When we search for something, we expect the results that we see to be both relevant and important for the terms that we entered into a search box. The relevance of the answers to our queries are as important as the relevance of evidence in a legal case in that those answers can shape what we think and influence our actions.
Relevance and Information Retrieval
Just like with Evidence, there are a number of rules that search engines follow when it comes to determining whether or not something is relevant.
When someone who does SEO for a living is asked about how search engines rank pages in search results, their first answer might be that a search engine will return pages that are relevant to a query, and will rank those pages based upon how relevant and how important they might be. That determination of relevance by a search engine follows some practices identified in information retrieval, and a relevance score is often referred to in patents and papers from the search engines as an information retrieval score.
Relevance has long been studied in information retrieval. One of the people who have been specifically studying and writing about relevance is Dr. Tefko Saracevic, a professor at the School of Communication, Information and Library Science at Rutgers University. If you do SEO, or if you’re very interested in how relevance might be defined, it’s highly recommended that you read at least one of his papers on the topic, possibly starting with Relevance: A Review of the Literature and a Framework for Thinking on the Notion in Information Science. Part II (pdf) Note that part I was written around 30 years ago. Dr. Saracevic has been studying relevance for a long time.
The following three quotes are used to start the paper:
“Relevant: having significant and demonstrable bearing on the matter at hand.”
“Relevance: the ability (as of an information retrieval system) to retrieve material that satisfies the needs of the user.”
Merriam Webster (2005)
“All is flux.”
Plato on Knowledge in the Theaetetus (about 369 BC)
Even better, if you get the chance, I would recommend watching a presentation that Dr. Saracevic gave on Relevance at the University of Tennesee in 2007, at:
Relevance in Information Science. (The video isn’t coming up through the link, but here’s a link to the Abstract. Maybe it will start working again.) It is a little over an hour long, but your understanding of how relevance is used in information science and how it has evolved over time will be greatly expanded.
Among some of the things Dr. Saracevic points out is that relevance is dynamic. What’s relevant to us can change based upon how much we know about a topic. It can alter depending upon whether we are first exploring a topic, comparing different pages on that subject, or even trying to buy something. As our informational and situational needs change, so does what might be relevant for us.
Relevance and Search Engines
Search Google for [Baltimore ravens] and your intent may have been to find a latest score in a football game, the origin of the team name, a roster of the players on the team. You could be searching for tickets, or the location of their stadium. If it’s the morning of a home game, and you’re searching from the area around Baltimore, Google may focus primarily upon where you could get tickets. If you search when the game is late in the fourth quarter, Google might focus upon the score. If you search in the middle of the summer, before the season starts, Google might offer Ravens’ news focusing upon signing free agents or extending player contracts.
Search engines have been evolving and becoming more sophisticated in how they treat “relevance” in determining which search results to show. Early search engines worked towards finding web pages that contained the keywords you used in your query. This type of relevance was a substitute of the concept of recall, or a showing of all the documents that included those words. Google started out by returning those keyword matches, but also attempted to rank those pages based upon how important they might be, based upon whether those pages were linked to by other pages, with pages linked to by more important pages ranked higher.
Google used a different definition of relevance when they started showing advertisements on third party web pages in their adsense program, which involved placing pages in different classifications based upon the topics or categories of those pages. I wrote about it in the post, Google’s Second Most Important Algorithm? Before Google’s Panda, there was Phil.
That category approach required that the search engine look at the words and phrases that appear upon pages, and find documents where the same words and phrases tended to co-occur. Documents with co-occurring words would be clustered together, and serve as “categories” for those pages.
We’ve seen Google determining that some queries evidence a desire to see maps and businesses for a certain location, even when the location itself isn’t included within the query. This type of situational need presents “relevant” results such as map results for nearby pizza parlors when we type the word [pizza] into a Google search box.
Google’s new Knowledge base search results show us information about entities that appear in our queries, and some additional information as well. When you’re signed into Google, a search for [espn radio] shows three listings in the knowledge base results for “People related to espn radio.” It’s highly likely that Google examined its query log files to see what other types of things that people search for when they search for [espn radio] to try to anticipate our next query.
In that case, Google is trying to return relevant results by learning from previous searchers and how it may have helped meet their situational and informational needs.
Search engines do try to return relevant results in response to a query, but that definition of relevance is a dynamic and shifting one that doesn’t always depend upon whether or not the keywords from a query match keywords used on a page in the page title, heading, content, and in anchor text pointed to that page.