An Expansion of Importance Scores for Web Page Rankings?
When a judge looks at evidence entered into court, he weighs a number of factors. One of them is whether the evidence offered is relevant to the case at hand.
The other is how important that evidence might be.
Now, a piece of evidence by itself doesn’t have to be groundbreaking to important, but for example, testimony related to the character of a 40 year-old defendant in a criminal proceeding by his third grade teacher may be somewhat relevant, but probably not all that important.
Importance Scores in Search Engines
When a search engine ranks pages for a set of search results, it also usually looks at two different and distinct types of calculations, which it combines together to serve pages to searchers. Those scores likewise focus upon how relevant a result might be to a query entered, and how important that page or picture or video might be.
A recent patent application from Microsoft takes a look at the way that “importance” is determined, and comes up with a variation that differs somewhat from what we might usually think of in an importance score.
One importance score that most people might be familiar with is PageRank, and is sometimes referred to as a “static” rank because it doesn’t change from query to query.
Expanding Importance Ranking Factors
Would it be faster and easier, if when we perform a search, we could choose to search amongst the most important “sports” pages, or “news” pages or “recipe” pages for the results of our search? Would we get more relevant results?
One place to start looking at this topic is the Microsoft patent application:
Providing and using search index enabling searching based on a targeted content of documents (Appl. No. 20070203891)
Invented by John A. Solaro and Keith D. Senzel
Assigned to Microsoft
Published August 30, 2007
Filed: February 28, 2006
The authors of this patent make some good points. For instance:
Many new search engines, and new features for existing search engines, are being developed that focus on one specific “vertical” subject matter domain to provide shopping searches, blog searches, research searches, and the like.
However, the static rank of the documents in the index only takes into account generic pagerank attributes, not attributes related to a specific vertical that targets specific subject matter.
Therefore, the static rank is not useful for filtering the index for particular attributes of the vertical in question, which critically limits the effectiveness and utility of these vertical search engines for users.
While using something like PageRank might be an easy method to determine the importance of a page, it may not be the best when comparing things like shopping sites and educational sites.
Different Ranking Factors
The Microsoft patent filing doesn’t go into a lot of detail regarding how it might rank documents in one topic differently than documents in another, though the “claims” section of the document does mention calculating a “readability score” for documents. We might look at an example from another search engine to see how that could be done.
It wasn’t too long ago that a Google patent filing showed how Google might rank blogs based upon a “quality” score, which would be an importance score rather than a relevance score.
So, if a search engine was to go through different websites, and determine topics or categories for them, and then come up with different kinds of “quality” or “importance” scores for those different types of topics, would we see more relevant results in our searches? It’s possible.
Blending Results Found with Different Ranking Algorithms
I can’t help but think of universal and blended search results being served by the search engines these days, when reading this patent application, where different types of results, based upon different types of importance factors are being blended into search results.
How does a search engine determine which video or image or news result is the most important when it mixes those into a result set? A timely news result isn’t ranked in importance based upon the number of links to it – it wouldn’t be timely if it was.
How might a search engine decide whether one type of shopping site was more important than another? Or sports sites? Or news sites? Definitely something to think about.