Yahoo Collaborative Tagging Suggestions Use Goodness to Combat Tag Spam

Tagging allows people to assign labels to contents using keywords, so that they can share what they find, recall what they’ve looked at before, and discover content that others have labeled.

Tagging can also be prone to spam, and to bad suggestions for tags. A Goodness Measure might be used to offer suggestions for tags, that avoids bad tags and spam in those suggestions, and that looks at:

  • The authority of a person tagging,
  • The probability that a person tagging an object with one keyword might tag the same object with another keyword that frequently co-occurs with the first one in the tags used by others for that object,
  • The probability that any object tagged with with one keyword is tagged with the other keyword, based upon tags used by others.

  • A Goodness Measure Score of the tag to the object, which uses the sum of the authority scores of all users who have assigned that tag to that object.

A patent application from Yahoo on Systems and methods for collaborative tag suggestions, published last week, provides some ideas on how such a system could be created and used.

Tagging plays a strong role in three of the offerings from Yahoo – Flickr, Del.icio.us, and My Web 2.0 (now Yahoo Bookmarks). Would it be a surprise for them to start offering collaborative tagging suggestions? When you start typing a tag in Del.icio.us, a dropdown may appear which shows suggestions. Why offer these suggestion, and how do they benefit users?

Here’s the abstract from the patent application:

Abstract

A set of general criteria have been defined to improve the efficacy of a tagging system, and have been applied to present collaborative tag suggestions to a user. The collaborative tag suggestions are based on a goodness measure for tags derived from collective user authorities to combat spam. The goodness measure is iteratively adjusted by a reward-penalty algorithm during tag selection. The collaborative tag suggestions can also incorporate other sources of tags, e.g., content-based auto-generated tags.

Tagging

Tagging allows users to enter free form labels for any object, and they don’t have to try to classify those objects into some universal ontology. Taggers can also use combinations of tags.

There are some difficulties in using tags, though:

  1. Unlike physical objects, digital contents are seldom semantically pure so as to fit in a specific category; and,
  2. It is difficult to predict the paths through which a user would explore to discover a given object.
  3. The number of tags tends to multiply at an exorbitant rate.
  4. The structure of a traditional hierarchy disappears.

Faceted Classification

Tagging is similar in some ways to faceted classification, which uses “clearly defined, mutually exclusive, and collectively exhaustive aspects to describe objects.”

The patent application provides the example of a music piece, which can be identified by facets such as artist, albums, genre, and composer.

A faceted system created by experts is going to be more complete than what is found in the free form tagging of objects.

How this system works

Collaborative filtering is used to suggest tags to users, supposedly “leveraging the collective wisdom of groups of users.”

The suggested tags have properties that include:

  • High coverage of multiple facets (covering different aspects or facets),
  • High popularity, and;
  • Least effort.

If the tags are used by a large number of people for a particular object, these tags are likely to be used by a new user for the given object.

Least-effort has two meanings as described in this document:

  1. the number of objects identified by the suggested tag combination should be small,
  2. the number of tags for identifying an object should be minimized as well.

This makes it easier to find tagged content again.

The patent application delves pretty deeply into:

  • How annotations may be made,
  • How user profiles can be created to understand who the people are who are tagging content,
  • How a social network can be used to understand relationships between different taggers,
  • How the co-occurrence of tags can be used to cover more facets of an object,
  • How tags might be suggested based upon collaborative assessment of previously applied tags, from members of a social network,
  • How an autocompletion function can be used to suggest tags,

Calculating the Goodness Measure

The idea behind the Goodness Measure is to try to help offer suggestions of “good” tags.

Here are the variables used to calcuate the Goodness measure (referred to above):

  1. An authority score “a(u),”
  2. a probability function associated with the same user “P.sub.s(t.sub.i|t.sub.j;o),”
  3. A probability function associated with all users “P.sub.a(t.sub.i|t.sub.j),” and;
  4. A goodness measure “VC(t, o), where u denotes a user, o denotes an object, and t, t.sub.i, and t.sub.j are tags.”

The first part, the authority score, is tied to the person doing the tagging. The more consistently a person tags with the majority, the higher their authority score.

The other parts of the Goodness Measure are expanded upon more fully in Yahoo Towards the Semantic Web: Collaborative Tag Suggestions, which I pointed to at Search Engine Land in a post which describes a good number of related papers and patent filings – The Social Side Of Trustrank.

A little more…

The site Boxes and Arrows sometimes takes a look back at the history of information architecture. While reading through this patent application, I was reminded of a couple of these Boxes and Arrows articles:

Share

4 thoughts on “Yahoo Collaborative Tagging Suggestions Use Goodness to Combat Tag Spam”

  1. I have looked into social bookmarking, or tagging, from an seo point of view, although it can be used to drive traffic to a website I am yet to see any of this traffic convert into sales. I have talked to several people who experimented with a tagging cooperative (spam colony of sorts) and they came to the same conclusion…. Whats the point!

  2. Consider it less from the perspective of capturing links, and more from the perspective of capturing people’s attention.

    For instance, if you are the owner of a Bed and Breakfast in a historical town, you could fill up Flickr with photographs of the views and activities that take place in that town, and nearby. You could join with other merchants and service providers to paint a portrait of the place that could be pretty attractive to visitors, and tag those images in meaningful ways that help others find them.

    Flickr not only allows your to tag images, but also to geotag them, so that it’s possible that your images will be found if someone starts looking at maps. Flickr images are also incorporated into Yahoo’s image search, and there’s a decent chance that the annotations and tags used for those pictures (if they are “good” tags) will be used to help determine what queries those images come up for.

    If you are a company that specializes in the manufacture of a specific type of machinery, and you take pictures of those machines, and the manufacturing process, and the operation of those machines, those images again, could be found on Yahoo searches – if you take the time to put them in Flickr, and use good tags.

    Google, Yahoo, Microsoft, and Ask have all stated in one way or another that they are paying more attention to what their users are doing on the Web. Annotations and tagging are explicitly mentioned in many of the papers and patent filings that I’ve seen.

    It’s worth exploring, and attempting to do so in a way that isn’ spammy – there are a lot of opportunities to use tags for images and for bookmarks and websites because it’s a fairly new and developing area – and it doesn’t hurt to be involved from the early days.

  3. Offering tagging selections is one thing but to give a score is another. I like how I tag my content and really wouldn’t want to be rated on it… just my view.

Comments are closed.