I originally wrote the following article a couple of years ago for publication at Website Magazine. It presents one way of thinking about the evolution of search and search engines, and I thought it might be a good idea to share it here as well. I’ve added a few very minor updates to the article.
Search engines have come a long way since their modest beginnings — although you may not have noticed. The major engines such as Google, Yahoo and Bing guard their search secrets closely, so one can never be absolutely certain how they are operating. But they are evolving, and personalization seems to be the wave of the future.
Search engines have already developed through two major stages and now may be on the verge of a third generation. The first stage was based simply on matching keywords in documents — where the same results were shown to all searchers, regardless of who they were or their original search intentions. The second stage, where we may be now, examines how searchers interact with the search engine to predict their intent. Finally, the third stage will attempt to consider the actual interests of searchers then recommend pages accordingly.
Stage One- Keyword Matching
Before the Web and today’s search engines, searching through a database filled with textual documents meant matching the terms in your query to the exact appearance of the terms in those documents. Sorting documents by relevance or importance would have been a monumental task, if possible at all. Some database searches let you only locate documents where certain words appeared within a defined distance from other specified words from the same document. For example, a search for “California beaches” would be effective only if both terms in a document were located within one word of another.
Then the Web introduced us to an interconnected network of pages that could link to each other using hyperlinks. Search engines evolved to understand differences in the importance of words when they were located in different parts of a page. For example, if you searched for a certain phrase, pages containing those words in their titles and headlines might be considered more relevant than other pages where those words also appeared, but not in those “important” parts of pages.
Relevance was also found by indexing words that link to other pages. If a link pointing to a page used the phrase deep sea fishing as anchor text, the page being pointed to would be considered relevant to deep sea fishing. The existence of links to pages also has been used to help define the perceived importance of a page. Information about the quality and quantity of links to a page can be used by search engines to get a sense of implied importance of the page being linked.
However, there is a limit to the effectiveness of this type of keyword matching. When two people perform a search at one of the major search engines, there’s a chance that even if they use the same search terms, they might be looking for something completely different. For example, as someone who enjoys a cup of coffee, when I search for java I might be looking for something completely different than a programming friend looking for some technical information on the popular programming language. The term java can mean coffee, a programming language, an island of Indonesia or even something else.
Stage Two- Learning Search Behavior
As search engines progressed and users were given more options (Web pages) to find information, the engines needed to respond with a refined approach to search. The second development of search engines started asking the question: How do we go about learning intent when someone types a certain phrase into a search box?
Several options surfaced. You could create some type of profile for a searcher to collect information about their interests — either by having them complete a form, recording their activities and search history, looking through the contents of their desktops and emails or from both their explicit and implied interests. Unfortunately, people often hesitate to share detailed personal information about their interests with search engines. Plus, an individual’s past searching history may not be helpful in predicting their future intentions.
You could aggregate information collected from a large number of interactions between users and search engines. What pages do people click when faced with a list of search results? If the vast majority of those searching for java choose pages about programming, it would make sense to show more programming pages in search results and fewer pages about coffee.
You could study user actions — how they move their mouse pointer across a search results page, how long they stay at a selected page, how far down that page they scroll and many other possibilities.
Studying a series of searches from the same user may offer a glimpse into modified search behavior. How does an individual change their queries after receiving unsatisfactory results? Are search terms shortened, lengthened or combined with new terms? Comparing selected results (Web pages) of one user to that of another using the same query could be very telling. Although the search engines will not share their strategies, it’s clear that this type of analysis is being used elsewhere on the Web. Consider the item-to-item recommendations that Amazon.com offers when people perform searches at that store (people who purchased this book were also interested in …). Now, imagine a search engine recommending pages selected by other users who searched using the same terms.
Add to that some other information that a search engine might collect about a user when a search is performed — location, language preferences indicated in their browser or the type of device they are using (mobile phone, handheld or desktop).
Search engines could learn a lot about Web searchers by examining the services that we select. Some of the information engines may be looking for from users:
- Search results clicked upon.
- Choices of interest in email alerts.
- Personalized search histories.
- Ads clicked upon.
- Bookmarked pages (Delicious, Yahoo, Myweb 2.0).
- Picture tags (Flickr).
- Annotations (Google Sidewiki, Twitter, Friendfeed, etc.).
- Web pages chosen for customized search engines (Google custom search).
- Queries used and pages selected in vertical searches (Google Maps, Yahoo local search, Google Product Search, etc.).
- Personal profiles (Orkut, MySpace, etc.).
- Query revisions and many others.
Stage Three – Learning From the People
At some point, the search engines may go beyond personalization based on interactions with search and other services, to analyzing footprints people leave on the Web itself. User profiles in places like MySpace or Facebook, “digging” at Digg.com, claimed blogs at Technorati and other emerging spots on the Web have given users the ability to not only put their personal stamp on countless pages, but endless opportunities for users to leave their tracks all over the Web — and for someone or something else to study those tracks. Ask yourselves, what does this imply when it comes to privacy?
Digital signatures associated with identities from initiatives like OpenID or Typepad authentication may provide even more insights about a person and their interests.
Personalization and SEO
Personalization should, and likely will have a big impact on the way people search, what site owners learn about their intended audiences and measuring the effectiveness of SEO campaigns — especially to SEO firms using ranking reports as one way of measuring the efficacy of their efforts.
Can we learn from the evolving stages of search engines? In attempting to provide personalized search results, the focus of search engines’ efforts has shifted from matching keywords to knowing more about the true interests of searchers. Keyword matching still plays a role in what search engines do when returning results, but information gathered from those searchers is playing an increasing role in the results they see.
I was on the phone with a colleague a few months ago when he identified his highest-ranking competitor for his choice of keywords. I searched using his same terms and could not find the site that he claimed was at the top of the rankings. I asked him to scroll to the top of the Google search page he was viewing, and whether he saw a link labeled sign out at the far right — he did, meaning that he was signed in with Google and his query was being treated a personalized search using his past search history. In a non-personalized search at Google he was actually outranking his competitor’s site, yet it appears that, while signed in to Google, he was visiting his competitors’ pages so often that they were ranking higher for those keywords than his site. Clearly, personalization presents us all with some new challenges.
While we are left to speculate about search engine behavior and observe the changing landscape, there are some steps that an SEO professional or any website owner can take while anticipating the effects of personalization:
- Learn about Social Networking Theory and Online Social Networks.
- Recognize and share with clients the diminishing value of ranking reports.
- Aim towards measuring results and conversions in a meaningful manner from log file analysis and Web analytics tools.
- Find ways to learn more about your intended audiences and existing customers.