Social tags like those used by Flickr or Delicious are interesting in that they allow people to categorize their own efforts (and those of others) and share material based upon those classifications.
But, the result of tagging can be a pretty flat list of many categories. There is a usefulness to a hierarchical ordering of information that enables people to browse and scroll down through categories. It can make it easier for people to find the information that they may be looking for.
A Ph.D. student from Stanford, Paul Heymann, has been working with Professor Hector Garcia-Molina to find a way to build Tag Hierarchies to make the efforts of tagging more useful. He notes that:
Tagging systems are excellent at the task that they were designed for—allowing a large, disparate group of users to collaboratively label massive, dynamic information systems like the web, media collections of millions of images, and so on. We are working to make these systems better by automating production of hierarchical taxonomies that describe the data from the raw flat tags generated by users.
Some of the preliminary results of this effort can be found in a paper titled, Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems, which was published earlier this week.
In some ways, tagging seems similar to the use of meta data to describe the content of a document or web page or image, or whatever is being tagged. One of the main differences is that tagging can be done by a community of viewers (or listeners) while meta data is normally defined by the creator of the object being described.
The paper uses two different sets of data. One from social bookmarking service Del.icio.us and the other is the scholarly tagging system for academic papers, CiteULike. (Awesome tool if you read academic whitepapers.) The differences in the way these two resources are tagged by their users makes Del.icio.us a better candidate for adding a hierarchy that CiteULike. The reasons?
- Some CiteULike users don’t bother with tags, so there is a low density.
- There is a low overlap between users, since many are in different fields of research
- The tags used are much more detailed, and less general
It would be great to see some of the ideas described in this paper applied to the images at Flickr.