How a Search Engine Might Use a Searcher’s Knowledge, Interests, and Education to Rerank and Validate Search Results

The amount of pages on the Web that a search engine could try to index is extremely large, and the approaches that search engines attempt to use to index and rank those pages is mostly an automated effort, but that doesn’t mean that the search engines don’t have people take a look at search results, and try to gauge how relevant their automated results might be.

A search engine typically locates web pages that contain the keywords entered by a searcher within a search box. The order that those results appear are based upon a number of algorithms used by search engines which look at various factors, such as: the frequency and number of entered keywords that are within each page and the position of the entered keywords within each page.

An example might be a first page that has a keyword located in the title or near the top of the page ranking higher than a second page that has a keyword in a footer or near the bottom of such second page. That first page might be presented to a searcher before the second page because of the location of the keyword.

While this automated approach might be satisfactory to some searchers, other searchers might find rankings of pages to be inadequate or irrelevant to their needs.

How might a search engine verify page ranking results of a search algorithm with respect to the specific needs or characteristics of specific groups of users?

A recent patent application from Yahoo explores the topic, and it wouldn’t be too much of a surprise of the other major search engines employed some processes of their own to do something similar. In fact, a set of Quality Guidelines (pdf) were uncovered from Google, which provides instructions to people who manually review the pages that appear in search results from Google.

Search Engine Quality Raters

Yahoo’s patent filing appears to indicate that they may have a similar approach to Google in reviewing the quality of their search results. Since Google recruits quality reviewers online, I looked around to see if I could find any classifieds for reviewers from Yahoo. I did find a classified ad for someone who might work as a “Search Quality Analyst” at Yahoo. The ad described some of what such an analyst might do at Yahoo’s Santa Monica Office:

The successful candidate will contribute to Yahoo’s search results quality measurement and anti-spam efforts by:

– Participating in several relevance tests designed to measure data quality across Yahoo’s web search product line.
– Training and mentoring team members.
– Receiving and administering feedback.
– Identifying, problem solving, and/or clearly communicating issues or problems.
– Coordinating resources to meet various relevance testing objectives.
– Building training sets to support the development of automated classifiers.
– Analyzing user session data to understand query intent and user behaviors.
– Contributing to the development of web spam detection methods.
– Performing other web search data QA tasks as assigned.

A somewhat similar, though less detailed ad at Craig’s list for a “Search Quality Analyst,” might or might not be from Yahoo. The Santa Monica location, and the description of the company involved does make it sound like it might be from Yahoo. That ad was looking for someone to:

- Perform web search result review and evaluation tasks as assigned
– Review search results and grade the results based on guidelines provided by instructors

Google does offer part time “quality rater” positions that one can telecommute to, and it’s possible that Yahoo may too.

But, it’s also possible that instead of specifically hiring people to act as quality reviewers, Yahoo may be relying upon information entered by ordinary searchers about their interests and knowledge and educational backgrounds, and their selection of web pages to review the quality of Yahoo search results.

Yahoo’s Quality Ratings

While we have a sense of what Quality Reviewers might look at when rating sites from Google’s handbook, we don’t know much about what Google looks for in the people they hire to rate web sites. The Google ad for a “quality rater” tells us that they are looking for people who meet certain requirements:

The ideal candidate would encompass the following qualities:

– In depth, up-to-date familiarity with English-speaking web culture and media.
– If you have knowledge of other languages cultures, please indicate this on your resume.
– Broad ranges of interests.
– Strong ability to read and write in the English language.
– Excellent web research skills and analytical abilities.
– Excellent written communication skills.

If you have these qualities, you may be exactly what we’re looking for!

Other requirements include the following:

– Bachelor’s degree or equivalent.
– A high-speed internet connection.
– Valid U.S. or Canadian work authorization.

The Yahoo patent throws an interesting twist into the concept of quality raters of search results. It doesn’t involve the hiring of quality reviewers, but rather takes advantage of information that it may learn about searchers to see which pages they select in search results, and compare what they choose to the results shown in response to searches for specific queries.

They tell us in the patent filing that they may rank “users’ knowledge and/or interest in specific categories” and also rank those people based upon “education level and field.”

The user rankings may influence their ratings for search results generated for particular search terms.

Example of Raters’ Rankings used in Quality Reviews

Two different searchers look at search results for the term “Vista”, and receive the same search results x, y, and z.

The first searcher selected search result “z”, and the second searcher chose search result “y.”

The search term “Vista” belongs to the categories “Technology and Telecommunications” and “Biz.”

The first searcher has a User Knowledge Ranking 3 in the “Technology and Telecommunications” category and a User Knowledge Ranking of 6 in the “Biz” category.

The second searcher has a ranking of 3 and 1 in these same categories.

Since the first searcher selected search result “z” has rankings of 3 and 6 in the “Technology and Telecommunications” and “Biz” categories of the search term “Vista”, this searcher’s rankings (3+6) are added to the relevance score for search result z to achieve a total score of 9.

When other people search on the term “Vista” and select search result “z”, their rankings in the “Technology and Telecommunications” and “Biz” categories may also be added to the total relevance score for search result “z.”

For the search result “y”, since the second searcher who selected this search result has rankings of 3 and 1 in the “Technology and Telecommunications” and “Biz” categories of the search term “Vista”, this searcher’s rankings (3+1) are added to the relevance score for search result y to achieve a total score of 4.

Since no one selected search result “x”, the relevance score is 0 for search result “x.”

If there was an education ranking for each of the searchers, that would be added to the relevance scores for each of the results for that particular query also.

In the absence of an education ranking, the search results for the query term “Vista,” based upon the category scores for the two searchers would have a compiled ranking of z, y, and x, from highest to lowest rank. That could change as more searchers with different rankings for the “Technology and Telecommunications” and “Biz” categories search for “Vista” and select specific search results.

The Yahoo Search Validation Patent Application

Knowledge and Interests Based Search Term Ranking for Search Results Validation
Invented by Jian Wang
Assigned to Yahoo
US Patent Application 20080140641
Published June 12, 2008
Filed: December 7, 2006

Abstract

Ways to verify rankings of search results, produced by a search algorithm executed for a particular search term.

A number of users’ knowledge and/or interest in specific categories may be ranked to be used to calculate new rankings of search results, e.g., web pages based on search terms.

Users may also be ranked by education level and field.

These user rankings are then used to determine a new ranking of search results that are generated for a particular search term.

For instance, the users that select (e.g., or click on) a particular search result cause a relevance score to be compiled based on such users’ rankings in the categories to which the search results or search term belongs.

Relevance scores are compiled for each search result that is selected by a plurality of users executing a number of searches.

The new ranking of the search results for a particular search term is determined based on the relevance scores of such search results.

It can then be determined whether the current ranking, produced for a particular search term by the search algorithm, is valid by comparing this new ranking to the current ranking.

Where do Searcher Knowledge, Interest, and Education Rankings Come From?

Information about a searchers interests could be collected when a searcher registers with a search engine like Yahoo, and enters information about their knowledge and interests in a number of categories. They could possibly be asked to rank themselves on a scale of 1 to 10.

Or their rankings in categories could be implied from other information, such as their occupation position or field or education field or education level.

I’m not sure if collecting information about a users interest and knowledge and educational level in this manner is an ideal approach, and it’s quite possible that there may be other methods also used to collect this kind of information.

Conclusion

A good number of white papers and patents from the search engines in recent years describe using data collected from people searching and browsing the Web to improve search results.

Patents and papers about personalization of search results tell us how user behavior information might be used to influence the search results that individuals may see. A few of those documents have told us that user data from people who may share some common interests and who may tend to select a lot of the same pages may influence the search results that those people who share common interests and select common pages may see.

This approach may affect the rankings of pages in search results for everyone who may search at the search engine.

Share

19 thoughts on “How a Search Engine Might Use a Searcher’s Knowledge, Interests, and Education to Rerank and Validate Search Results”

  1. Hi Dan,

    It can be a surprise to learn that a search engine may use human reviewers to look at their search results, and try to gauge how elevant those results might be. I think that it’s not a bad idea to develop processes that attempt to determine search quality, and including some manual reviewers can help.

    I suspect that there are other forms of testing and validation that we don’t hear too much about, also.

  2. I have heard about Google employing quality testers in the past. It does makes sense, as it’s just another level of QA test that would be part of any software. Given that search engine software is developed for data inputs (websites) that the developers have no control over, it seems logical that they would employ many levels of QA.

  3. It is amazing how much effort Google put into to ensure that the web surfing experiences of the users are of a good quality one. It is a perpetual race between site spammers and Google. While it is debatable on some of Google directions, the intent of keeping the internet sites free of rubbish is agreeable.

  4. Hi Chris,

    Good point. One of the biggest challenges that a site like Google faces is that they do use data from so many sources that they have no control over, using many different variations of code, different data structures, different languages, and other obstacles. QA is very important to what they do in so many different ways.

    Hi Tony,

    Web spam is one of those issues that Google faces, but making sure that a user experience is a good one goes far beyond just fighting spam. Even in the absence of web spam, making sure that the results that show up in response to queries are relevant and quality results can be a challenge.

    If Google makes changes to how they determine the relevance of results on a regular basis, and it seems that they do, they have to check and test to make sure that the changes that they make improve the results that they show in some way. Some of that checking may be automated, but a manual review can be helpful, too. :)

  5. Wouldn’t click popularity be a better way of accessing relevance?

    If varied users doing the same basic keyword search clicked listings then returned back to the SERPs after a few seconds, this would be trigger a red flag if those same users found other sites on the same page that they stayed on for a substantial amount of time.

    Look at the complex url parameters for each listing on the SERPs of Google, Yahoo and MSN – compare that to a few years ago when you would click DIRECTLY on the link.

  6. Hi PR,

    To some degree, the process described in this patent application is looking at a measure of click popularity, but it’s also paying attention to information that it knows about the people doing the clicking.

    It is quite possible that user behavior like a quick return back to the search engine after a look at a page might be considered as a negative vote – though it’s possible that someone clicking through may realize that they’ve seen the page that they landed on before – so using that as a signal for a small number of searchers might not be the greatest of signals, negative or positive.

    Good point on the URLs that you see in search results at the major search engines. I suspect that there are a number of good reasons for the search engines to not use the actual URLs to a page, even though they might display them in the search results.

  7. Hi Bill,

    I have heard about these potential new search considerations quite a lot. One of the things that I have been trying to research, is how the Search Engines are going to be able to gather and store this information each time for each individual user.

    Is Google (Or/And The other search engines), going to do it via cookies? Or will they only be able to personalise your search if you are logged in to their services?

    I for one, use about 3 different computers, and usually buy new computers every 12 months or so. I would be interested to learn how these new ‘Personalised Searches’ are going to be acheived.

    Any Thoughts?

    James.

  8. Hi James,

    The easiest way for the search engines to collect data about an individual’s browsing and searching history is to collect it when they are logged in to personalized search. But the use of a toolbar itself can provide that kind of history as well, even if one isn’t logged in, if one is using some of the features in the toolbar, such as the one which shows PageRank of pages that you visit in Google.

    In the absence of personalized search being turned on, or the right features turned on in the toolbar, a search engine could try to use cookies, or could look at data in their own search logs, or access logs purchased from Internet Service Providers (who collect a lot of information about our surfing and browsing activities). The use of cookies during a search session may enable the search engine to see some immediate past queries that you’ve used to suggest new pages, and to present some advertising based upon the searches that you’ve performed recently.

    I wrote a post not too long ago about the synchronization feature in Google’s latest version of their toolbar: Google Toolbar 5: Sync Your Settings and Share Your Browsing History.

    If you move from one computer to another during the course of a day, and other people use those computers also, signing into the Google toolbar provides you with some benefits such as the ability to look at Google Bookmarks that you may have saved, and take them with you from one computer to another, or specialized gadgets buttons on your toolbar, or the use of Google notebook, etc. That’s done with the synch function of the toolbar. But, if you log in to use the synch function, you are also providing the search engine with information about your personal browsing and searching activity at each of those computers.

    Enough people use Yahoo portal functions such as email, finance, MyYahoo, MyBlogLog, etc., that attaching a cookie to their activities is a possibility.

  9. You’ve dug up the dirt with the “Quality Rater”. It seems the only info we can get in regards to google is that which they have no choice to publish. Be it positions or patent filings. Someone should get smart and create a “Google Database” of public knowledge.

  10. Hi Howrank,

    A Google database like that would be pretty interesting. I suspect that some private companies may have such a database, or at least they probably should. :)

  11. I think there are alot of things we do not see or that will change quickly with google.com as they try to keep results at there best. Click popularity, local search I believe will play a part in this as will things like google maps, youtube etc which also have started to rate well in results for certain searches. Look for results to cross promote other google owned components and features.

  12. Hi Dean,

    I expect the same – that we will see more changes rolling out from Google on a regular basis that attempt to take advantage of integrating technology and content from them, and make those services better.

    For instance, making it easier to subscribe to RSS feeds from Youtube would make Google Reader a more useful tool to discover more videos from – but it would also be helpful in other feedreaders.

  13. I’ve heard a rumor that Google pays attention to how long someone spends on a web page. I wonder if that is true. I also have heard that Google expects searches for the full domain name occassionally.

  14. Hi Rob,

    Google did officially announce that they would be considering page speed as a ranking signal this past week on the Offical Google Webmaster blog, in their post: Using site speed in web search ranking.

    That’s not the same as the search engine considering the amount of time someone might spend on a web page, but it’s an interesting and related concept.

    There have been a number of patent filings from Google that date back at least five years, if not further, that mention that Google might consider how long someone does spend on a web page in determining the quality of that page, and even whether or not it might be considered spam. For example, Google’s patent on Information Retrieval Based on Historic Data, originally filed in 2003, tells us:

    [0093] According to an implementation consistent with the principles of the invention, information corresponding to individual or aggregate user behavior relating to a document over time may be used to generate (or alter) a score associated with the document. For example, search engine 125 may monitor the number of times that a document is selected from a set of search results and/or the amount of time one or more users spend accessing the document.* Search engine 125 may then score the document based, at least in part, on this information.

    * emphasis, mine.

    There are others from Google that mention that they might consider the amount of time someone spends on a page as well.

  15. Is the human reviewer mentioned above akin to a Google focus group? Never realised they undertook such efforts – are these non-Googlers? I know most Google products are reviewed in-house before showtime.

  16. Hi Matthew,

    The quality raters from Google are employees of Google. While their ratings might sometimes influence the rankings of pages, they also act as a internal validation system to reflect how well the search algorithms that return pages are working.

  17. Is the human reviewer mentioned above akin to a Google focus group? Never realised they undertook such efforts – are these non-Googlers? I know most Google products are very popular in whole world, its day by day increase user, Thanking for Google.

Comments are closed.