How a Search Engine Might Rank Bookmark Sets, Playlists, Directory Pages, and other Collection Items

Search engine optimization is an ever growing and ever changing field, and as search engines and the Web change, so does SEO.

There are no classrooms, nor college courses, no single one site or conference series or book that can help you keep up with those changes.

Paying attention to a lot of blogs, news reports, press releases, and other sources of information can help provide some insights about changes in SEO, and discussions at forums and conferences and social sites can present a lot of signals and noise about what might be new in search. It’s not always easy, and not always even possible to distinquish between the signals and the noise sometimes.

I look at a lot of patent filings and papers from the search engines here because they can provide views of how search engines may work from the perspective of the search engines. I consider them primary sources because they come directly from the search engines, but even those sources often only provide glimpses of possibilities rather than actual insights into how search engines function.

Perhaps the best value that may be taken from search engine patent filings isn’t so much the processes that they describe, but rather the hints of assumptions behind some of the methods and systems that they present.

A recently published patent application from Yahoo explores a different way of thinking about some pages that the search engine may try to include in search results, such as a set of bookmarks, or a playlist or web directory pages.

With an increase in recent years of social sites that contain lists of sites or songs or videos, search engines may be considering ways to include such collection pages in search results in a manner that can be helpful to searchers.

Not surprisingly, the inventor listed in this patent filing from Yahoo is the founder of one of the most well known bookmarking sites, Joshua Schachter of Yahoo’s bookmarking site Del.icio.us, who recently left the search engine.

What are Base Items, Base Collection Items and Non-Base Collection Items?

Usually, when you perform a search at a search engine, your search will include only one type of result at a time, such as web pages, or images, or files, or songs, or videos, or merchandise, or some other type of result, though we do see blended results sometimes at the major search engines, where different types of results may be mixed together in search results, such as web pages and news pages and images.

The type of item against which a search is run can be referred to as the “base type” of the search, and items belonging to a base type of a search can be referred to as “base items”. The base items for a video search would be videos.

Some potential search results may be collections of base items, and can be referred to as “collection items.”

Some of those collections of base items may also be base items – for example, a web page containing a list of other web pages, such as a web directory page. These collection items that are also base items can be referred to as “base collection items”.

When someone searches for a particular base type, the search results they see may include base collection items and base items that are not collections. A Web search for “Manhattan hotels” could include collection item pages such as a travel site page that lists web pages for Manhattan hotels, and also base item results – web pages for specific Manhattan hotels.

There are collection items that aren’t also base items which may also be useful to a searcher. A person’s set of “bookmarks” is a collection of a set of web pages, yet a bookmark set is not itself a web page. Bookmark sets usually aren’t listed in search results in searches for web pages.

Someone’s playlist for songs wouldn’t show up in a song specific search, and a person’s playlist for videos wouldn’t show up in a video specific search, since those playlists aren’t songs or videos. But one of those playlists might be very good results for those searches.

Collection items like those playlists are collection items, but they aren’t base items if they show up in song or video searches. They can be referred to as “non-base collection items”.

Sometimes, the identification of relevant non-base collection items could be more useful to someone searching than the identification of relevant base items. For example, a link to a playlist that contains links to 40 songs from a particular musician can be more useful than a link to one song from the same artist.

It could be helpful to searchers if a search engine provided information about:

  • Relevant base items,
  • Relevant base collection items, and;
  • Relevant non-base collection items

This is especially true if the search engine displays that information in a way that enables searchers to determine the relative relevance of those different items.

That’s the focus of the patent filing from Yahoo:

Techniques for including collection items in search results
Invented by Joshua E. Schachter
US Patent Application 20080147640
Published June 19, 2008
Filed: December 19, 2006

Abstract

Techniques are provided for including collection items in the ranked set of search results that are returned to a user in response to a search query. Collection scoring techniques are also provided for generating relevance scores for collection items in a different manner than relevance scores are generated for base items that are not collections.

The collection scoring techniques may be applied to non-base collection items, base collection items, or both.
Items that match the search query, including base items and collection items, are ranked in a unified ranking based on their respective relevance scores, thereby allowing searches to readily determine the relevance ranking of matching collection items relative to matching base items.

There are some interesting assumptions in this patent filing. One of them is that:

…users are more likely to use non-base collection items to find the information they are seeking when non-base collection items are ranked highly in an integrated ranked set of search results, rather than presented separately from the base item rankings.

The document also tells us that the search engine could provide relevance scores for collection items in a different manner than for base items that are not collections. So, a directory web page, which is a collection item, might be ranked differently than a web page that isn’t a collection item. For instance, a directory web page may be ranked in part based upon the rankings of the pages that it lists.

Examples of Collection Items

Tags — A tag can represent a collection of items that have been tagged with a given tag, and can be applied to a number of different types of base items, such as web pages, or event records, or songs, or videos, etc.

Search — A collection of items in a set of search results generated when a search is executed, and can include any type of base item. A search for metadata associated with images represents the collection of images produced as a result of that search.

Bookmark sets — A collection of items bookmarked by someone, such as a set of the bookmarkers favorite web pages.

Also See — An “also see” list is a collection of items somehow logically related to a given item. An online encyclopedia entry on a topic may contain a list of “also see” links to other encyclopedia entries for related topics. That list isn’t a web page, but it contains a collection of web pages containing those related topics.

Playlist — A collection of playable media items such as songs or videos.

Wish list — A collection of purchasable items placed by someone into that wish list.

Directories — A collection of items assigned into a category within a directory, such as a list of web sites created to help people find information on different topics.

Travel Itinerary — A collection of travel items, such as “ports of call, flights, car rentals, tours, etc.”

Registering Collection Items

When a search engine indexes base items and collection items, it may treat each differently when it presents them in search results, and when it scores both types of items for ranking.

A web directory page that includes a “collection” of other web pages may be discovered when the search engine crawls the web. But, a set of bookmarks that includes a collection of web pages will likely not be crawled in the same way.

Instead, a a web service that can be used by people to create and share bookmark sets could register such bookmark sets with a web page search engine. A media playback site that enables people to define and share playlists of songs could register those playlists with a music search engine.

The bookmarking site, or the playlist site may allow their users to make their bookmarks or their playlists public and searchable, or it may do that automatically without giving the option to the creators of those sets of collections.

Those sites may also have a scoring mechanism for determining which bookmark sets, or playlist sets, or image sets are most useful, and might only register with a search engine the sets associated with “a usefulness score that exceeds a given threshold.”

A search engine could use other methods of discovering collections of items, and the patent filing describes a few. Regardless of which method is used, some other information might be gathered about the collection items.

Registration Metadata

Some examples of information that might be looked at when non-base collection items (playlists, bookmarks, wishlists, etc.) are registered can include:

  • Information identifing the base items in the collection represented by the non-base collection item;
  • Information about the creator of the non-base collection item, such as:
    • An indication of the reputation of the creator,
    • An indication of the interests of the creator, /li>
    • An indication of the expertise of the creator,
    • An indication of the education of the creator,
  • An indication of the affiliations of the creator, etc. information about the non-base collection item, such as a name assigned to the non-base collection item,
  • Tags that have been assigned to the non-base collection item,
  • An indication of the popularity of the non-base collection item,
  • An indication of the categories to which the non-base collection item belongs (e.g. the fact that a playlist is for country music, that a wish list is full of items required to set up the ultimate home theater, etc.).

Collection Relevance Scoring

The patent filing includes a number of factors that the search engine might look at when it might consider inserting collection items into base item search results. One way of determining the relevance of a collection item to a search result for a certain query might involve creating a score based upon characteristics of the base items included in the collection item.

The individual relevance scores of the base items contained in the collection item may be determined by:

  1. Calculating relevance scores for each base item in the collection using a conventional base-item scoring technique,
  2. Averaging those relevance scores for all of the base items in the collection, and;
  3. Creating a relevance score for the collection item which would include, at least in part, that average relevance score of the base items

Some illustrative examples of other considerations used to create a relevance score of a collection item, based upon characteristics of base items within the collection item could include looking at:

  • User ratings assigned to songs in a playlist,
  • Download frequencies of songs in a playlist,
  • The average duration of songs in a playlist,
  • The number of sales of each item in a wish list,
  • Quality ratings associated with each item in a wish list,
  • How many people indicated that they plan to attend events assigned to a tag,
  • Safety ratings associated with countries included in an itinerary, and;
  • The popularity of travel items in an itinerary.

Conclusion

While I’ve presented a number of details about how search engines might treat collection items like playlists and bookmark sets in search results, the patent application goes into a lot more depth.

If you’re interested in how a search engine might decide whether to include a person’s travel itinerary or playlist of videos or bookmarks in a list of web page results, you may want to spend some time with this patent application.

As I noted in the start of this post, search engines are evolving, and the growth of social bookmarking and sharing sites provide opportunities for search engines to show searchers collections of information that may be helpful to those searchers.

The next time you see a “collection item” in a set of Web search results, consider that the relevance ranking factors for that result might be different than the ones for the Web page results displayed by the search engine.

Share

6 thoughts on “How a Search Engine Might Rank Bookmark Sets, Playlists, Directory Pages, and other Collection Items”

  1. Nice stuff, bookmarked the patent for some reading later. I like new perspectives and anything inclusive of human interaction without creating greater risk of spam. Many of these quasi-personalized signals seem to offer that potential IMO.

    I like the delineation of content classifications and scoring models – I wonder how that pans out as far as computational resources of course.

    Anyway, thanks as always… some reading with tea after supper methinks.

  2. Hi Dave,

    I agree – I liked the different approach that this patent application provides, but I wonder how prone it might be to being manipulated by people setting out to spam search engines.

    I did think that this was pretty creative. I suspect that classification and scoring doesn’t necessarily have to be done real time for the most popular queries, which could save some computational resources.

    Thank you.

  3. It strikes me that publishers could use a “collectionmap.xml”, cf., a sitemap.xml, to enumerate the contents of a collection to a search engine?
    It’d be a big file, though!

  4. Hi Tony,

    An XML feed might be a good way of sending information to a search engine, even from a large site like del.icio.us or Flickr, or a last.fm.

    The patent application doesn’t go into depth on how a site might register collection data, such as playlists, but it does say that a conventional crawling method like those used to crawl web pages probably wouldn’t be a good approach. They do leave the specifics open to different possibilities.

    [0027]A separate mechanism will typically have to be used to obtain the information about the collection items than is used to collect information about base items. For example, one technique for obtaining information about web pages involves “crawling the web” by following links between web pages. However, conventional web crawlers are not designed to obtain information about bookmark sets that users have created to access their favorite web pages. Consequently, a different mechanism must be used to gather information about bookmark sets, to enable bookmark sets to be included in the ranked results of web page searches.

    [0028]Various types of mechanisms may be used to obtain information about non-base collection items. The present invention is not limited to any particular type of non-base collection item discovery mechanism. For example, non-base collection items may be explicitly registered with a search engine by the same mechanism that is used to create the non-base collection items. Thus, a web service that allows users to create and share bookmark sets may register such bookmark sets with a web page search engine. Similarly, a merchant web site that allows users to add items to wish lists may register those wish lists with a merchandise search engine. As yet another example, media playback software that allows users to define and share playlists of songs may be designed to register those playlists with a music search engine.

    It could be a pretty big file. :)

  5. Thanks, Chris.

    Same here – it’s one of the reasons why I like going through patent applications from the search engines so much – you never quite know what you might find.

Comments are closed.