Search engines have transformed the way that we locate information and learn about the world around us. When we type a term into a search box, we are presented with pages of search results that bring a wealth of information to our fingertips.
The results that we see often include more than just a list of web pages. A search for [baseball] at Google provides links to web pages, videos, news articles, book results, and related search queries.
The top result I received was a link to the Major League Baseball (MLB) site, with a list of sitelinks to eight additional pages related to that domain. Interestingly, four of those sitelinks are to different subdomains on the MLB site, to team pages for the Boston Red Sox, The New York Yankees, the Los Angeles Dodgers, and the Baltimore Orioles.
There may be many pages that show up in search results relevant to a query that we perform. In my search for [baseball], I was shown “Results 1 – 10 of about 197,000,000 for baseball.” I’m not going to look at all 197 million pages, and chances are that I might not make it past the first page of the search results.
The top results for a query term are the ones that most people will visit, or they might change the query terms that they’ve used to something broader or more specific if those top results don’t look encouraging.
Search results pages are supposed to be listed in an order that places the most relevant and important results at or near the top of the list. There are times when many of the most relevant pages are from the same domain, so you could possibly have a result from that domain on the first page of search results, and another page from that domain on the fifth page.
Or, you could see results from the same domain filling up a number of spots on the front page of search results.
The way that search engines have been addressing when multiple search results from the same domain that are relevant to a query is to show the most relevant and/or important result as a normal result, and then show another page from that domain under it as an indented result, possibly with a link to “more results” from that same domain under the indented result. We see this at all of the major commercial search engines.
This indentation of results is helpful for searchers because it provides them with a chance to see that a site might contain multiple relevant pages that may provide information about their query. It is also attractive to site owners, who may like that their pages are shown more than once in top search results in response to a certain query.
A recent patent application published from Microsoft, Domain Collapsing of Search Results (US Patent Application 20080294602), provides some of the technical details behind how search results are indented.
The process itself probably wouldn’t surprise most people who pay a lot of attention to the way that search results are presented to searchers.
Quite simply, a top number of search results are returned from a search index based upon a searchers’query, and results from the same domain are associated and clustered together so that two or more search results might be presented as a single cluster of search results rather than presented individually. An option to see more search results from the same domain may be provided to the searcher.
A domain is identified under this process by looking at the structures of the URLs for pages. So, when a URL ends with a country-specific tag, the domain would include the last three words of the URL before the first forward slash, i.e. /. So, the domain of the URL www.msn.co.in/ is “msn.co.in”. When the URL does not end with a country-specific tag, the domain would include the last two words of the URL before the first forward slash, i.e. /. So, the domain of the URL www.msn.com is “msn.com”.
So, under this approach, when more than one URL from the same domain is in a top number of results for a specific query, those URLs may be clustered together, with the main result, an indented result or results, and a link to more results from that domain.
Domain Collapsing and Page Titles
Most people who have used one of the major search engines for a while have probably seen indented search results at some point in time, and the process above probably comes as no surprise. I don’t know if it’s novel enough to be the subject of a patent filing, but there was one idea presented that was interesting and might be new to most people who have seen indented search results before.
Each page of a site should ideally have a unique title that describes the content of the page that it appears upon. Unfortunately, some pages of a domain share the same title.
When more than one page from the same domain is determined to amongst the most relevant and important for the same query term, and they share the same title, an indented result might not be shown to searchers.
Unanswered Questions Involving Domain Collapsing
This patent filing doesn’t address what happens when there are different sites that share a domain, like at wordpress.com.
We are told that a search engine can turn on or turn off domain collapsing, and can possibly enable searchers to turn the feature on or off too. But, can domain collapsing be turned on for some kinds of queries and not others?
There is no discussion of site links – which are links that might appear below the top search result for a query, and which are intended to be navigational shortcuts to pages that are related to that top listing.
Those site link results appear to work differently than this domain collapsing process in that collapsed results pages are pages that show up as relevant and important to the query searched for, while site links are links that might be final destination pages when someone is performing a navigational query.
I remember going to the library and finding books by looking through big books of categories, searching at dumb terminals or microfilm indexes, and walking through bookshelves and scanning the titles of books.
The web brings a library of information to our fingertips, and search engines provide indexes of information which can be searched much more quickly than the shelves of a library and can potentially deliver much more information to us.
The way that search results are presented to us determines what we end up finding when we are looking for information, or when we are attempting to perform a task on the Web. It’s worth paying attention to how a search engine might cluster together results from the same domain or how it might provide results of different types such as news or web results or videos or books.
While indented results might not be something new to most site owners or searchers, one of the biggest takeaways from this patent application for site owners might be to make sure that each page of their site has unique page titles, so that if a search engine determines that more than one page of the site might be relevant for a query, it will present the pages together as a clustered result, and show searchers that the site contains multiple pages that are relevant and important for that query term.