If you like taking a peek under the hood of a search engine, and seeing how it might work, a new patent granted to Google provides some interesting insights.
It describes how a search query might start at a standard index, and if there aren’t enough results within that index, look for more in an extended index to return to a searcher.
System and method for searching an extended database
Invented by Kourosh Gharachorloo, Fay Wen Chang, Deborah Anne Wallach, Sanjay Ghemawat, and Jeffrey Dean
Assigned to Google
US Patent 7,174,346
Granted February 6, 2007
Filed September 30, 2003
Once a search query is received from a user, a standard index is searched based on the search query. The standard index forms part of a set of replicated standard indexes having multiple instances of the standard index. A signal is then determined based on the search of the standard index. When the received signal meets predefined criteria, an extended index is searched. The extended index forms part of a set of extended indexes having at least one instance of the extended index. There are fewer instances of the extended index than instances of the standard index. Extended search results are then obtained from the extended index and at least a portion of the extended search results is transmitted towards a user.
There are times when you perform a search in a search engine, and the results just aren’t very relevant.
When you don’t get the results that you expect from a internet or intranet search engine, is it because the search engine isn’t very good, or is it because there isn’t much indexable information on the web or intranet document repository that contains content related to that search?
A new patent application discusses how the folks who run search engines might identify difficult queries where there may not be much content collected by the search engine on certain topics. The process in the patent filing provides search engines the chance to offer searchers suggestions for queries where they may find an answer to questions that they may be searching for, or to allow indexing efforts from the engines to work on filling those gaps.
The best introduction to the patent filing is probably a couple of pages from IBM which discuss the efforts of the researchers who came up with this process:
I joined a friend on a road trip yesterday afternoon down to Washington, DC, for a bowl of chili and a book report.
The chili was pretty good, courtesy of the folks at Hard Times Cafe. The book report was even better, as delivered by P. J. O’Rourke at the Cato Institute. The politics of the event were something I pretty much ignored. What excited me was the concept behind the book; a lesson to people who would write something, but can’t think of what to write.
A snippet from the publisher’s description of the book:
In On The Wealth of Nations, America’s most provocative satirist, P. J. O’Rourke, reads Adam Smith’s revolutionary The Wealth of Nations so you don’t have to. Recognized almost instantly on its publication in 1776 as the fundamental work of economics, The Wealth of Nations was also recognized as really long: the original edition totaled over nine hundred pages in two volumes—including the blockbuster sixty-seven-page “digression concerning the variations in the value of silver during the course of the last four centuries,” which, “to those uninterested in the historiography of currency supply, is like reading Modern Maturity in Urdu.”
Knowing something about the language used in a query might help a search engine decide which pages to show a searcher. A search engine wants to lead its users to pages they can read. A recent Microsoft patent application explores how language types can be used in ranking pages in search results.
Language types can be seen as a measure of relevance because they can help find pages relevant for a search. They are considered a “query-dependent” measure of relevance, because while the language type for a page can be identified before anyone performs a search that might include the page, the language used in the query influences which results are shown.
Query-independent measures, or attributes, are different. I wrote previously about a couple of other Microsoft patent applications which this one notes are related, in a post titled Ranking Search Results by File Type and by Click Distance.
Those two measures are considered “query independent,” because whatever words used in the query that might return those pages is irrelevant to the ranking method.
In the early days of the web, before search engines were one of the primary ways to get around, building friendships and business relationships with others online, and providing links to each other’s sites was a great way of delivering value to your visitors, and having them entrust their visitors to your site.
While many link building efforts these days focus upon gaining as many links as possible to a site, to increase its link popularity, and rankings in search engines, that older aim of relationship building is still a viable and valuable approach to bring traffic to your pages.
I wrote about that approach in an article for the December issue of Target Marketing Magazine.
In addition to the print version, it’s also available online at: SEM: Make the Right Connections: Build Web site traffic through links
It can be really helpful have a full set of tools that can allow you to manage and control information related to SEO projects.
I had the chance to talk with Michael Jensen and Aaron Stewart of Solo SEO while at Pubcon a couple of weeks back, and offer some input and suggestions on their new SEO project managment software, which launched today. They already had a nice set of tools aimed at helping someone working on an SEO project collect information about their efforts, and track those carefully. But they were also willing to take notes, and listen carefully to some new ideas.
Andy Beal describes a number of the features that they offer with their Solo SEO software, and Michael Jensen talks about it more in their blog – Announcing SoloSEO.com, a new SEO Project Management solution.
I don’t normally write about products and software here, but I’m making an exception in this case because the creators of this software have shown that they are paying attention to what people within the industry are asking for, and that they are listening. And, there are some nice tools included here that make the management of SEO projects easier.
One of the members of Cre8asite Forums has a couple of sites that he’s filled with images of the locations of his sites. He’s a talented photographer, in addition to a skilled web master, and the pictures he has on his site are terrific. He has also placed those images under licenses from Creative Commons.
Because of the licenses, he’s had people use images from his site on their own noncommercial web sites, with links pointing back to his sites. He’s also had inquiries from people wanting to use his images in commercial works. Since the images are likely to be of interest to people who may want to find out more about what he has to offer, having links back to his site brings traffic to his pages from people who could possibly become customers of his.
The beauty of Creative Commons licenses are that they inform people that they could possibly use material created by other people under conditions expressed in the licenses. They don’t harm people’s rights under copyright law, but rather make communication about possible uses of those materials easier. The Creative Commons pages show how to use a license, and provide many examples.
Google and Creative Commons
This is the first part in what is now a three part series, with the second part available at 20 More Ways that Search Engines May Rerank Search Results, and a third part at Another 10 Ways Search Engines May Rerank Search Results. It may be time for a fourth part soon. (Added 2013-08-31)
Search engines try to match words used in queries with words found on pages or in links pointing to those pages when providing search results.
Often, the order that pages are returned to a searcher are based upon an indexing of text on those pages, text in links pointing to those pages, and some measure of importance based upon link popularity.