Yahoo Collaborative Tagging Suggestions Use Goodness to Combat Tag Spam
Tagging allows people to assign labels to contents using keywords, so that they can share what they find, recall what they’ve looked at before, and discover content that others have labeled.
Tagging can also be prone to spam, and to bad suggestions for tags. A Goodness Measure might be used to offer suggestions for tags, that avoids bad tags and spam in those suggestions, and that looks at:
- The authority of a person tagging,
- The probability that a person tagging an object with one keyword might tag the same object with another keyword that frequently co-occurs with the first one in the tags used by others for that object,
- The probability that any object tagged with with one keyword is tagged with the other keyword, based upon tags used by others.
- A Goodness Measure Score of the tag to the object, which uses the sum of the authority scores of all users who have assigned that tag to that object.
A patent application from Yahoo on Systems and methods for collaborative tag suggestions, published last week, provides some ideas on how such a system could be created and used.
Tagging plays a strong role in three of the offerings from Yahoo – Flickr, Del.icio.us, and My Web 2.0 (now Yahoo Bookmarks). Would it be a surprise for them to start offering collaborative tagging suggestions? When you start typing a tag in Del.icio.us, a dropdown may appear which shows suggestions. Why offer these suggestion, and how do they benefit users?
Here’s the abstract from the patent application:
A set of general criteria have been defined to improve the efficacy of a tagging system, and have been applied to present collaborative tag suggestions to a user. The collaborative tag suggestions are based on a goodness measure for tags derived from collective user authorities to combat spam. The goodness measure is iteratively adjusted by a reward-penalty algorithm during tag selection. The collaborative tag suggestions can also incorporate other sources of tags, e.g., content-based auto-generated tags.
Tagging allows users to enter free form labels for any object, and they don’t have to try to classify those objects into some universal ontology. Taggers can also use combinations of tags.
There are some difficulties in using tags, though:
- Unlike physical objects, digital contents are seldom semantically pure so as to fit in a specific category; and,
- It is difficult to predict the paths through which a user would explore to discover a given object.
- The number of tags tends to multiply at an exorbitant rate.
- The structure of a traditional hierarchy disappears.
Tagging is similar in some ways to faceted classification, which uses “clearly defined, mutually exclusive, and collectively exhaustive aspects to describe objects.”
The patent application provides the example of a music piece, which can be identified by facets such as artist, albums, genre, and composer.
A faceted system created by experts is going to be more complete than what is found in the free form tagging of objects.
How this system works
Collaborative filtering is used to suggest tags to users, supposedly “leveraging the collective wisdom of groups of users.”
The suggested tags have properties that include:
- High coverage of multiple facets (covering different aspects or facets),
- High popularity, and;
- Least effort.
If the tags are used by a large number of people for a particular object, these tags are likely to be used by a new user for the given object.
Least-effort has two meanings as described in this document:
- the number of objects identified by the suggested tag combination should be small,
- the number of tags for identifying an object should be minimized as well.
This makes it easier to find tagged content again.
The patent application delves pretty deeply into:
- How annotations may be made,
- How user profiles can be created to understand who the people are who are tagging content,
- How a social network can be used to understand relationships between different taggers,
- How the co-occurrence of tags can be used to cover more facets of an object,
- How tags might be suggested based upon collaborative assessment of previously applied tags, from members of a social network,
- How an autocompletion function can be used to suggest tags,
Calculating the Goodness Measure
The idea behind the Goodness Measure is to try to help offer suggestions of “good” tags.
Here are the variables used to calcuate the Goodness measure (referred to above):
- An authority score “a(u),”
- a probability function associated with the same user “P.sub.s(t.sub.i|t.sub.j;o),”
- A probability function associated with all users “P.sub.a(t.sub.i|t.sub.j),” and;
- A goodness measure “VC(t, o), where u denotes a user, o denotes an object, and t, t.sub.i, and t.sub.j are tags.”
The first part, the authority score, is tied to the person doing the tagging. The more consistently a person tags with the majority, the higher their authority score.
The other parts of the Goodness Measure are expanded upon more fully in Yahoo Towards the Semantic Web: Collaborative Tag Suggestions, which I pointed to at Search Engine Land in a post which describes a good number of related papers and patent filings – The Social Side Of Trustrank.
A little more…
The site Boxes and Arrows sometimes takes a look back at the history of information architecture. While reading through this patent application, I was reminded of a couple of these Boxes and Arrows articles: