Yahoo Collaborative Tagging Suggestions Use Goodness to Combat Tag Spam

Tagging allows people to assign labels to contents using keywords, so that they can share what they find, recall what they’ve looked at before, and discover content that others have labeled.

Tagging can also be prone to spam, and to bad suggestions for tags. A Goodness Measure might be used to offer suggestions for tags, that avoids bad tags and spam in those suggestions, and that looks at:

  • The authority of a person tagging,
  • The probability that a person tagging an object with one keyword might tag the same object with another keyword that frequently co-occurs with the first one in the tags used by others for that object,
  • The probability that any object tagged with with one keyword is tagged with the other keyword, based upon tags used by others.

Continue reading

How Quickly Does Google Update its Database(s)?

David Degrelle, of e-SEMA (European Search Engine Marketing Alliance) tells me that Google is pretty quick these days, and his blog post – Google calculates SERP’s in 14 minutes and goes real time! points to a quick turnover time.

I’d be a little surprised if Google refreshed it’s whole index in that quick a period of time – there are so many websites to crawl, including new sites, and updated pages, that it seems unlikely. Yet, blogs that ping Google when there are updates may find the content of their pages getting indexed pretty quickly. And I’ve seen times indicated for news results at Google News that are minutes old, rather than hours.

ChaCha Search – Is People Powered Search Better?

I really haven’t taken a close up look at ChaCha Search before today, but the idea is interesting – using human search guides whom can ask you specific questions about what you are looking for, and who will help you find answers.

A question raised by the approach is how well can it scale – can it handle questions from a lot of people, and are there enough expert searchers who would participate?

I’ve run across two patent applications assigned to them, and an unassigned one listing the CEO of the company as the inventor, and referred to by one of the assigned patent filings:

Search Tool Providing Optional Use of Human Search Guides
Invented by Scott A. Jones, and Thomas F. Cooper
Assigned to: ChaCha Search, Inc.
US Patent Application 20070174273
Published July 26, 2007
Filed: September 1, 2006

Continue reading

Why Sometimes Best Search Results aren’t Always Top Search Results

When we talk about the results that show up in search engines, we often do so in terms related to relevance and importance of those results.

Sometimes the results we see, and that we don’t see, are influenced by other factors, such as steps taken by the search engines to reduce the amount of work that they have to perform in order to return results to searchers.

Using Two Tiers of Search Results

If a search potentially returns thousands of results, and people only look at the first few pages of those results, it would make sense for a search engine to serve results in batches, and perhaps only initially use a modified (and much smaller) version of their database to answer search queries.

A first index tier may have a number of potential results pruned, so that documents that are more likely to be returned at top answers to searches are kept. The first batch of results returned to searchers may be taken from this pruned index.

Continue reading

Google Patent Granted on Semantic Units (Meaningful Compounds)

When searchers type a query into a search engine, it isn’t uncommon for them to use more than one word. It also isn’t unusual for those words to be a semantically meaningful phrase rather than just a list of keywords.

Multiple search terms entered by a user are often more useful if considered by the search engine as a single compound unit. Assume that a user enters the search terms “baldur’s gate download.”

The user intends for this query to return web pages that are relevant to the user’s intention of downloading the computer game called “baldur’s gate.” Although “baldur’s gate” includes two words, the two words together form a single semantically meaningful unit.

If the search engine is able to recognize “baldur’s gate” as a single semantic unit, called a compound herein, the search engine is more likely to return the web pages desired by the user.

Continue reading

Looking at Users’ Final Landing Pages to Develop Suggestions for Query Refinements

It’s getting pretty common for search engines to suggest query revisions when someone does a search these days.

One common query revision strategy is to look at the query sessions from previous searchers who used the same query, and see how they might have refined their searches, including spelling corrections, or adding and deleting words in subsequent queries during the same session.

A paper from Microsoft researchers, Query Suggestion based on User Landing Pages, takes that approach, and looks at using it in conjunction with another approach that looks at what they call “final landing pages.”

This poster investigates a novel query suggestion technique that selects query refinements through a combination of many users’ post-query navigation patterns and the query logs of a large search engine. We compare this technique, which uses the queries that retrieve in the top-ranked search results places where searchers end up after post-query browsing (i.e., the landing pages), with an approach based on query refinements from user search sessions extracted from query logs.

Continue reading

The Influence of Search Result Listings (Captions) on Clickthroughs

You type in a query in a search box at Google or Yahoo or live.com or Ask, and hit the search button.

In the search results, you see lists of links to pages that should have something to do with the keyword phrase that you typed into the search engine. Which do you click upon? How might the words in the caption – the title, snippet and URL – influence what people will click upon?

That’s the question raised in Microsoft’s The Influence of Caption Features on Clickthrough Patterns in Web Search (pdf)

I really like it when the search engines share some of their findings on usability issues around the way people search. Here’s the quick answer, from the paper’s abstract:

The findings of our study suggest that relatively simple caption features such as the presence of all terms query terms, the readability of the snippet, and the length of the URL shown in the caption, can significantly influence users’ Web search behavior.

Continue reading

Long Tail Studies by Web Search Researchers

Microsoft researchers are starting to take a closer look at search queries that are common and compare them to those that appear more rarely, in Heads and Tails: Studies of Web Search with Common and Rare Queries.

The paper immediately had me thinking of the writings of Chris Anderson, who started online marketers and ecommerce site owners thinking about products offered on the Web differently, in an article that he wrote for Wired Magazine called The Long Tale.

The article became a book, and led to a blog by its author, and has inspired many folks to look at ecommerce while paying attention to the long tail.

EBay was also taken the idea of the long tail enough to file a patent application that works on using it in the context of keywords. The patent filing has the incredibly long name Computer-Implemented method and System for Combining Keywords into Logical Clusters That Share Similar Behavior with Respect to a Considered Dimension.

Continue reading