Yahoo Collaborative Tagging Suggestions Use Goodness to Combat Tag Spam

Tagging allows people to assign labels to contents using keywords, so that they can share what they find, recall what they’ve looked at before, and discover content that others have labeled.

Tagging can also be prone to spam, and to bad suggestions for tags. A Goodness Measure might be used to offer suggestions for tags, that avoids bad tags and spam in those suggestions, and that looks at:

  • The authority of a person tagging,
  • The probability that a person tagging an object with one keyword might tag the same object with another keyword that frequently co-occurs with the first one in the tags used by others for that object,
  • The probability that any object tagged with with one keyword is tagged with the other keyword, based upon tags used by others.

Continue reading “Yahoo Collaborative Tagging Suggestions Use Goodness to Combat Tag Spam”

ChaCha Search – Is People Powered Search Better?

I really haven’t taken a close up look at ChaCha Search before today, but the idea is interesting – using human search guides whom can ask you specific questions about what you are looking for, and who will help you find answers.

A question raised by the approach is how well can it scale – can it handle questions from a lot of people, and are there enough expert searchers who would participate?

I’ve run across two patent applications assigned to them, and an unassigned one listing the CEO of the company as the inventor, and referred to by one of the assigned patent filings:

Search Tool Providing Optional Use of Human Search Guides
Invented by Scott A. Jones, and Thomas F. Cooper
Assigned to: ChaCha Search, Inc.
US Patent Application 20070174273
Published July 26, 2007
Filed: September 1, 2006

Continue reading “ChaCha Search – Is People Powered Search Better?”

Why Sometimes Best Search Results aren’t Always Top Search Results

When we talk about the results that show up in search engines, we often do so in terms related to relevance and importance of those results.

Sometimes the results we see, and that we don’t see, are influenced by other factors, such as steps taken by the search engines to reduce the amount of work that they have to perform in order to return results to searchers.

Using Two Tiers of Search Results

If a search potentially returns thousands of results, and people only look at the first few pages of those results, it would make sense for a search engine to serve results in batches, and perhaps only initially use a modified (and much smaller) version of their database to answer search queries.

A first index tier may have a number of potential results pruned, so that documents that are more likely to be returned at top answers to searches are kept. The first batch of results returned to searchers may be taken from this pruned index.

Continue reading “Why Sometimes Best Search Results aren’t Always Top Search Results”

Google Patent Granted on Semantic Units (Meaningful Compounds)

Semantic Units Found in Search Queries

When searchers type a query into a search engine, it isn’t uncommon for them to use more than one word. It also isn’t unusual for those words to be a semantically meaningful phrase rather than just a list of keywords.

Multiple search terms entered by a user are often more useful if considered by the search engine as a single compound unit. Assume that a user enters the search terms “Baldur’s gate download.”

The user intends for this query to return web pages that are relevant to the user’s intention of downloading the computer game called “Baldur’s gate.” Although “Baldur’s gate” includes two words, the two words together form a single semantically meaningful unit. The same is true with a phrase such as “New York,” which is two words that go together as a semantic unit.

If the search engine is able to recognize “Baldur’s gate” as a single semantic unit, called a compound herein, the search engine is more likely to return the web pages desired by the user.

Continue reading “Google Patent Granted on Semantic Units (Meaningful Compounds)”

Looking at Final Landing Pages for Suggestions for Query Revisions

Query Revisions Suggestions Can Be Based Upon Landing Pages

It’s getting pretty common for search engines to suggest query revisions when someone does a search these days.

One common query revision strategy is to look at the query sessions from previous searchers who used the same query, and see how they might have revised their queries, including spelling corrections, or adding and deleting words in subsequent queries during the same session.

A paper from Microsoft researchers, Query Suggestion based on User Landing Pages, takes that approach, and considers using it in conjunction with another query revisions approach that looks at what they call “final landing pages.”

This poster investigates a novel query suggestion technique that selects query refinements through a combination of many users’ post-query navigation patterns and the query logs of a large search engine. We compare this technique, which uses the queries that retrieve in the top-ranked search results places where searchers end up after post-query browsing (i.e., the landing pages), with an approach based on query refinements from user search sessions extracted from query logs.

Continue reading “Looking at Final Landing Pages for Suggestions for Query Revisions”