There has been a tremendous amount of growth, over the past few years, of web sites that use content management systems, such as blogs, ecommerce shopping sites, wikis, and others. How might that affect how search engines index the pages of those sites?
A new Yahoo Research paper, Page-level Template Detection via Isotonic Smoothing (pdf), discusses some of the problems that exist with so many sites using templates, and a method to use to try to understand if a page is using a template. Here’s a snippet from the paper:
The increased use of content-management systems to generate webpages has significantly enriched the browsing experience of end users; the multitude of site navigation links, sidebars, copyright notices, and timestamps provide easy to access and often useful information to the users.
From an objective standpoint, however, these “template” structures pollute the content by digressing from the main topic of discourse of the webpage.
Continue reading Yahoo Research Looks at Templates and Search Engine Indexing
This is a discussion of a Microsoft patent granted today that may not have been implemented, and may never be. It’s unclearly written, but worth discussing…
When you perform a search at a search engine, the page that shows the results of your query is often referred to as a search results page.
Search engines don’t like to show a link to the same page more than once in their search results pages – at least in the unpaid Web search part of their pages. But, most search engines also show advertisements on many search results pages, which look similar to the Web search results.
It’s also possible that a search query using multiple terms, each of which an advertiser may be bidding upon, may cause a page to show up in paid results more than once.
Continue reading Is Microsoft Removing Web Results When the Same Page Also Appears in Paid Results?
As a web site owner or online advertiser, it often makes good sense to look over statistics involving how people use your web site or your ads to see if pages might be changed to make them more user friendly, and increase the amount of sales or conversions that you make.
You might test different landing pages when you used paid advertising, or move different elements around on your site’s pages to see how people react to those changes.
Search engines often do the same type of thing, not only with the layout of their pages, but also with the results that they may present to searchers.
A new patent assigned to IAC Search and Media, Inc. (owners of Ask.com) describes how user data might be analyzed to help improve the look and feel of results pages, and the rankings of results, shown to users of a search engine.
Continue reading Ask.com Patent on Optimizing Search Results Pages Based Upon User Activity
If you like taking a peek under the hood of a search engine, and seeing how it might work, a new patent granted to Google provides some interesting insights.
It describes how a search query might start at a standard index, and if there aren’t enough results within that index, look for more in an extended index to return to a searcher.
System and method for searching an extended database
Invented by Kourosh Gharachorloo, Fay Wen Chang, Deborah Anne Wallach, Sanjay Ghemawat, and Jeffrey Dean
Assigned to Google
US Patent 7,174,346
Granted February 6, 2007
Filed September 30, 2003
Once a search query is received from a user, a standard index is searched based on the search query. The standard index forms part of a set of replicated standard indexes having multiple instances of the standard index. A signal is then determined based on the search of the standard index. When the received signal meets predefined criteria, an extended index is searched. The extended index forms part of a set of extended indexes having at least one instance of the extended index. There are fewer instances of the extended index than instances of the standard index. Extended search results are then obtained from the extended index and at least a portion of the extended search results is transmitted towards a user.
Continue reading Google Patent on Extended Search Indexes
There are times when you perform a search in a search engine, and the results just aren’t very relevant.
When you don’t get the results that you expect from a internet or intranet search engine, is it because the search engine isn’t very good, or is it because there isn’t much indexable information on the web or intranet document repository that contains content related to that search?
A new patent application discusses how the folks who run search engines might identify difficult queries where there may not be much content collected by the search engine on certain topics. The process in the patent filing provides search engines the chance to offer searchers suggestions for queries where they may find an answer to questions that they may be searching for, or to allow indexing efforts from the engines to work on filling those gaps.
The best introduction to the patent filing is probably a couple of pages from IBM which discuss the efforts of the researchers who came up with this process:
Continue reading Difficult Queries and Identifying Missing Content in Search Engines
I joined a friend on a road trip yesterday afternoon down to Washington, DC, for a bowl of chili and a book report.
The chili was pretty good, courtesy of the folks at Hard Times Cafe. The book report was even better, as delivered by P. J. O’Rourke at the Cato Institute. The politics of the event were something I pretty much ignored. What excited me was the concept behind the book; a lesson to people who would write something, but can’t think of what to write.
A snippet from the publisher’s description of the book:
In On The Wealth of Nations, America’s most provocative satirist, P. J. O’Rourke, reads Adam Smith’s revolutionary The Wealth of Nations so you don’t have to. Recognized almost instantly on its publication in 1776 as the fundamental work of economics, The Wealth of Nations was also recognized as really long: the original edition totaled over nine hundred pages in two volumes—including the blockbuster sixty-seven-page “digression concerning the variations in the value of silver during the course of the last four centuries,” which, “to those uninterested in the historiography of currency supply, is like reading Modern Maturity in Urdu.”
Continue reading On Rewriting the Wealth of Nations
Knowing something about the language used in a query might help a search engine decide which pages to show a searcher. A search engine wants to lead its users to pages they can read. A recent Microsoft patent application explores how language types can be used in ranking pages in search results.
Language types can be seen as a measure of relevance because they can help find pages relevant for a search. They are considered a “query-dependent” measure of relevance, because while the language type for a page can be identified before anyone performs a search that might include the page, the language used in the query influences which results are shown.
Query-independent measures, or attributes, are different. I wrote previously about a couple of other Microsoft patent applications which this one notes are related, in a post titled Ranking Search Results by File Type and by Click Distance.
Those two measures are considered “query independent,” because whatever words used in the query that might return those pages is irrelevant to the ranking method.
Continue reading Penalizing Pages in Search Results Based upon Language (Except English)
In the early days of the web, before search engines were one of the primary ways to get around, building friendships and business relationships with others online, and providing links to each other’s sites was a great way of delivering value to your visitors, and having them entrust their visitors to your site.
While many link building efforts these days focus upon gaining as many links as possible to a site, to increase its link popularity, and rankings in search engines, that older aim of relationship building is still a viable and valuable approach to bring traffic to your pages.
I wrote about that approach in an article for the December issue of Target Marketing Magazine.
In addition to the print version, it’s also available online at: SEM: Make the Right Connections: Build Web site traffic through links
Continue reading Article on Link Building in Target Marketing Magazine