SEO is Undead Again (Profiles, Phrases, Entities, and Language Models)

Sharing is caring!

SEO and Keyword Matching

I don’t recall clearly when I first started calling what I do SEO, and I didn’t have an official title at my first in-house SEO position back in 1996. I thought of that role as a webmaster, marketing manager, IT department, technical consultant, and did whatever else needed to be done. A friend’s sister worked at Digital Corp, and she sent us an email about a new service they had started called Alta Vista one day.

That’s probably when we first started thinking seriously about search engines, and their potential to help or to harm businesses. When Google came along, we became a lot more serious about search.

Back in the days just before Google started gaining any popularity when the leading search engines counted amongst their ranks Alta Vista, Excite, Infoseek and Lycos, a paper titled What is a tall poppy among web pages? by Glen Pringle, Lloyd Allison and David L. Dowe explored the possible decision trees that those search engines used to try to decide how pages might be ranked by search engines.

They gave us the following list of possible ranking signals:

  1. Number of times the keyword occurs in the URL.
  2. Number of times the keyword occurs in the document title.
  3. Number of words in the document title.
  4. Number of times the keyword occurs in meta fields – typically keyword list and description.
  5. Number of times keyword occurs in the first heading tag <H?>.
  6. Number of words in the first heading tag.
  7. Total number of times the keyword occurs in the document including title, meta, etc.
  8. Length of the document.

That list doesn’t vary too much from the kind of SEO analysis that many these days refer to as on-page SEO. But if you look carefully at the list, its focus is upon matching keywords in a document to the keywords used in a query.

When Google brought PageRank to search, we started thinking more about the importance of links pointing to sites, and the anchor text in those links. But SEO’s focus still seemed to be upon whether or not the keywords used in queries were also used upon the pages of a document, or the links pointing to a document.

Reranking Search Results

In November of 2003, something at Google changed, and the comfortable rankings that many websites had attained suddenly shifted. The change was referred to as the Florida update, following a practice developed at the Webmaster World forum of naming different Google updates, which seemed to happen every 4-5 weeks, like you would a hurricane. There have been a lot of theories about what the change might have been that caused that change in rankings.

One that I like, but won’t insist was the cause for the massive upheaval in rankings is explored in a Google patent entitled Ranking search results by reranking the results based on local inter-connectivity.

What’s interesting about the patent is that it describes a way to take a certain number of the top search results from Google for a specific query, and rerank them based upon how they link amongst each other. For example, if you look at the top 100 pages that show up in the search results, and see which pages link to other pages in those results, the pages that are most linked to may be boosted in the rankings of the pages showing up for those queries.

The Local Interconnectivity approach may or may not have been the cause of the changes from the Florida update, but the kind of reranking that it describes could cause the kinds of changes seen.

I’ve written a number of posts about other possible reranking results that the search engines may use to rerank and filter results, and many of those filters are part of the evolution of search as well. A few of those posts compiled many of the reranking approaches, and show some of the many ways that we’ve moved on since the early days of keyword matching:

Looking Beyond Web Publishers and Keyword Matching

Many of the methods described in those posts still rely upon a certain ranking approach that looks at the kind of onpage factors that I listed above combined with information about links from web publishers. But they mostly ignore one of the more interesting kinds of data that the search engines have all been collecting for years – how searchers actually use the web, how they:

  1. Perform searches,
  2. Browse the Web
  3. Refine queries during search sessions
  4. Click upon certain results
  5. Pass over other search results
  6. Spend more or less time upon pages
  7. Bookmark, save, or print other pages
  8. Interact in other ways with pages browsed and search results seen.

I wrote a post earlier this year, Improved Web Page Classification from Google for Rankings and Personalized Search, about a patent that describes how Google might classify pages based upon profiles that they create for websites, queries, and users. The basic ideas behind it aren’t so different from a post I wrote back in 2007 about a Microsoft patent that describes a similar approach – Personalization Through Tracking Triplets of Users, Queries, and Web Pages.

In a way, both describe how a search engine might transform from displaying pages based upon keyword matching to one that recommends pages based upon actual user behavior and the possible intent behind searches.

A snippet from that post:

Imagine a search engine keeping track of each user (u), as they perform a query (q) on the search engine, and seeing which pages (p) they click upon, and collecting those selections in what they call “triplets” of data, represented like this – (u,q,p).

Then consider that the search engine might map and compare those triplets of information against each other to see what kinds of relationships and associations exist between people making the same searches, faced with similar results. That information could then be used to personalize results shown to individuals.

Imagine that instead of just showing pages in response to that type of user data, the search engines also started spending more time presenting possible query refinements to get at the heart of the intent behind a search – whether there was a certain informational need or situational task that a searcher might be trying to address.

When people talking about search engines throw the word “semantics” into a description of the search, there’s often a discussion on mathematical models that might be used to try to understand the meanings behind words. There’s sometimes a reference to the company Applied Semantics, which merged with Google back in 2003, but it seemed on the surface that the methods that Applied Semantics brought to Google were being used with Google’s advertising.

A Google patent published earlier this year, which I wrote about in Search Based upon Concepts: Applied Semantics and Google describes how the Applied Semantics method could be used in Web search, providing a more interactive approach to search that involves showing suggested query refinements that may expand original queries to go beyond keyword matching to get at the meaning behind a query.

The globalization of search also means that search engines need to understand multiple languages and return results to searchers around the globe. Google has hinted at becoming a multilingual search engine with the development of language models. These language models are also a key to looking beyond keywords found upon a page.

For example, if you were to take a phrase like “auto mechanic,” and translate it into French, and then translate it back into English, there might be a few reasonable results found in that translation back, such as “car mechanic,” and “automobile mechanic,” and “auto mechanic.” In my post Google Synonyms Update, I described some of the approaches to synonyms that Google might be using to expand queries for searchers that broaden search results in reasonable ways.

Phrases and Named Entities

Several patent filings from Google look beyond matching individual keywords in a document to identifying and discovering phrases that have unique meanings.

Search engines sometimes took a shortcut in indexing some words, by not including some very frequently occurring words in searches. Terms like “a,” or “the,” or “on,” or “of,” would be passed over. But sometimes those frequently occurring words added meaning in the right context.

Google published a patent about meaningful stopwords and stop-phrases a couple of years ago. If you search for the word “matrix” you tend to see results about mathematics. If you search for “the matrix,” you tend to see results about a movie of the same name.

Another set of patent filings from Google described how the search engine might distinguish between “good” phrases and “bad” phrases, and rerank search results based upon whether or not a certain number of good phrases appeared in a top number of search results. For example, if you searched for “baseball stadium,” the search engine might look at the top 100 results, and calculate which other “good” phrases appeared in those results. Pages with more of those co-occurring phrases (up to a certain point) might be boosted in search results and rank higher.

I wrote a post this year, Phrasification and Phrase Posting Lists, about the second generation of patents from Google that explored how the search engine might implement a phrase-based indexing system to rerank search results by creating post lists that indicate which good phrases pages within its index might contain. Again, the idea is that if a page tends to use many of the same phrases that other pages on the same subject use, it’s more likely to be about that topic.

The phrase-based indexing system also looks at anchor text pointing to pages and may give more weight to links that use words and phrases that co-occur or are related phrases.

Taking the idea of looking at phrases one step further, some words and phrases can be said to be “entities,” in that they refer to specific people, places, and things. John Wayne is an entity, as is the Empire State building. A brand is an entity, and an idea is an entity.

A search engine might look at pages on the web, and extract information about those specific “named entities,” and collect that information to provide answers to questions, such as “when was John Wayne born.”

It might also attempt to identify entities in queries, and associate those entities with specific web sites when there seems to be a certain level of confidence about the relationship between a site and an entity. The following posts provide some examples of how named entities might impact search:

Conclusion

Search engines are evolving along with several paths from their early days of keyword matching, which include approaches such as incorporating user-behavior data into ranking pages, creating statistical language models, using semantic ontologies like that from Applied Semantics to become more interactive, understanding phrases better, understanding when phrases may refer to a specific person, place or thing, and more. I’m just brushing the surface with this post on the many directions that they are taking.

SEO is becoming more complex, but the ultimate goal is still to try to find useful and meaningful results for people trying to fulfill informational and situational needs. Search is changing, and the way that people search is changing as well, whether they try to use a conventional search engine, or even attempt to have a network of friends and associates provide answers on social sites.

If your idea of basic SEO echos the list above from the 1998 Tall Poppies paper, with backlinks and PageRank thrown in for good measure, you’re going to find yourself mystified when things like the Google MayDay rankings changes happened earlier this year, reducing visits based upon long-tail queries for many sites while increasing them for others.

This is the last of the “SEO is Undead” series.

The first post in the series, SEO is Undead 1 (Links and Keyword Proximity) explored how ideas about linking and how close a keyword might be to other keywords on a page might have transformed since the early days of SEO.

The second post in the series, Son of SEO is Undead (Google Caffeine and New Product Refinements), explored a little more deeply how changes sometimes take place in search and SEO.

What changes have captured your attention over the years?

Sharing is caring!

83 thoughts on “SEO is Undead Again (Profiles, Phrases, Entities, and Language Models)”

  1. Hi Bill,

    I’ve really enjoyed this series, so thanks for posting. I think Google’s ability to profile both users and websites is massively powerful, and this, tied in with local search is really going to be have a big impact on SEO in the near future.

    I think the days are numbered for directory-based websites topping the results for local search terms… profiling businesses and their websites by utilising tools such as Google Places is already beginning to drive local search, and will get better overtime.

  2. I agree with Trillo that Google’s most recent change with local search will have a large impact. The change makes complete sense too. It’s happening now, but soon we’ll see a big shift to searches from mobile or hand-held devices. More and more people are getting GPS enabled smartphones and are using them for searches. With these phones, you’re essentially searching ‘local’ wherever you are.

  3. Bill Slawski – It is definitely amazing at how much search engines like Google have changed. SEO will never stay the same and as long as technology improves so will methods search engines use to rank sites.

    However, I think the fundamentals of search engine optimization won’t change much and the trendy SEO tactics will be a memory.

    With that being said, I feel if you learn the fundamentals of good SEO practice you won’t have to worry much about changes being made.

  4. Hi Bill,

    This post gave me goosebumps. I used to remember back in 2004, when I just started in SEO copywriting, our mentors talked excessively about Florida and Big Daddy update which affected the established sites at the top of the rankings.

    It’s always great to read these fundamentals of search and be reminded of how fast evolving search and SEO is.

  5. Not going to pretend to have a fancy answer!
    For me it was a pivitol moment in my learning when Anil Dash won that competition, and i read about his moves…yes, they were pretty basic, but it was the tweaks google enacted afterwards which caught my attentions… (Florida). Reading your post I have a lot to learn. Again, thanks for sharing!

  6. These articles have been good explorations of the evolution of search indexing technology. People in our industry absolutely need to think about what may come next.

  7. When you step back and look at SEO over the time period you discussed, things really have changed. In my opinion, this rapid change tells me that not only will SEO continue to be important, rather, it’s importance will continue to grow and SEO companies that want to survive will be the ones that change as quickly as the game itself.

    On a side note, I actually have a little beef with Google looking beyond matching individual keywords in documents so as to identify and discover phrases that have unique meanings and here’s why…I post in forums frequently and like to have a snazzy looking avatar. I occasionally change them and every time I do I search the keyword “avatar” in Google images and last year, I had to search through pages and pages of image results just to find one that didn’t reference that blasted movie…maybe they should back that one off a bit…

  8. My first SEO position was similar to yours — I was an all purpose webmaster/designer/networking guy/printer fixer/Windows upgrader/etc for a small business charged with getting traffic to the corporate site. The days of handcoded html and completely static pages. I sort of miss them, but mostly don’t.

    Things have definitely changed a lot. I remember when putting spammy white text on a white background actually did work. And I remember the update where it didn’t.

    It’s really impressive how things have changed in a relatively short period of time, but I’m also surprised at how much has remained the same. Exact match in the title and URL still helps big time. Exact match in the domain, even better. Longer articles still often seem to equate to increased authority. Offsetting a keyword rich paragraph by thousands of pixels to the left is relatively commonplace and, last time I checked, still effective.

    The search engines are much smarter, and they do take trends/user intent/etc into account. But at their core, they’re still relying on very primitive signals. And I’m honestly surprised by that.

  9. I see more and more quick shifts between results, and, it seems like Google is constantly testing CTR’s to determine the best value for the user. Also, with Instant, it seems that the design, layout and information architecture will play a critical role in getting more CTR = ranking even better.
    Cheers

  10. Nice Article Bill. Great factors and points. Keep on posting under SEO Undead series.

  11. Great series! I truly enjoy professional posts that oppose the “SEO is dead” posts we see pop up every few months. I composed a few blog posts on SEOmoz.org back in 2007 discussing how Google may use user statistics to influence search. I started thinking this way as soon as Google emphasized stats like “Bounce Rate” on the beta version of Google Analytics at the time in 2007. Love this stuff.

  12. The SEO is dead discussion seems like a moment of collective insanity on the part of social media affectionados that may be having difficulty with ROI calculations. On another note, I remain somwhat surprised that user experience metrics do not seem to be a bigger factor in the algorithm. Any developments in this direction?

  13. For some reason I had never thought of looking at Google patents as a method of trying to get a hold on how it all works. Thanks for pointing me in this direction. As a relative newcomer to SEO I am constantly worried about how future updates are going to affect my ranking, so it is really important to at least attempt to keep up with the mysterious workings of the search engines.

  14. Somehow I landed here in the middle of the series…
    One interesting development is the effects of social media with SEO… my prediction is one day this is going to be huge… for now it’ll be a matter of a few people (including Google) figuring out how to play this out.

    Thanks for the series.

  15. The fact of the matter is that Google has already started to get behind the intent of a user’s search with the whole instant Google search factor. That in combination with the site previews (where you can see a preview of a site’s homepage from google) plus the merging of local places results into the organic searches are three major drastic changes, all within the last month or so.

    Part of being in the SEO business is about being adaptable; you have to be like a chameleon and always prepared for the unexpected and Google, as we all know, is constantly evolving. It makes you wonder how search will be determined in a few years; will backlinks even be as relevant or completely obsolete?

  16. Thanks for the detailed fleshing out. Yes, language is going to be driving results because Google is still trying to teach their AI better language skills and an understanding of relevance. This will be a constant process of refinement and tweaking. We still see grossly bad results as well as pretty good results. Because of the difficulties of learning how words “mean” they are using inexact criteria such as word proximity and frequency. If you look at the language they’ve used to communicate to the SEO community over the last year they keep using the term “synonyms”. Humans create meaning. Machines, no matter how sophisticated, can only parrot that process using artificial constructs. Understanding those constructs will ultimately give the insight into unlocking the SEO puzzle.

  17. Great post once again! have to give it to you for providing us such in-depth knowledge everytime… As always am enlightened by the information as when the following events mentioned occured i wasn’t even aware of what SEO was :)…

    P.S: Bill can you drop me a mail in my inbox, I need to talk to you… Drop the mail in the email mentioned above. Thanks

  18. Good post as usual, I’ve really liked the latest ones.

    One thing that struck me about what you chose to call Phrasification is that I consider it a bit backwards. I know a lot of other research point in the same direction but I’m still not certain if this implies that the use of common phrases within the topic causes the rank or the other way around.

    One quite funny example in my opinion is James Joyce. Joyce would be totally screwed in all serps due to his more or less unique language. A search for his own Finnegans Wake wouldn’t show his site (if he had been the kind of man to blog back in the thirties) as he would be the only blogger using his phrases. Of course Joyce is an extreme example, but I’d claim it’s valid. A truly revolutionizing SEO blog for example would be talking about things the others don’t, therefore not ranking well.

    It’s still a very interesting topic though and it would be great if you wanted to write more about it.

  19. Pingback: SEO​ is a Zombie! The Ultimate “SEO is Dead” (or Not) Resources List: 1997 till Infinity | SEOptimise
  20. Hi Dave,

    Thanks. The Florida update was the topic of a lot of conversations, and there were a lot of people who were angry about traffic to their sites from Google drying up like that. Too many of those site owners probably relied too much on traffic from Google, without rounding out their marketing plans to bring themselves visitors from other places as well.

    The Big Daddy update didn’t happen until late 2005, and it was a change more like Google Caffeine in that it wasn’t directly a change to ranking algorithms, but rather an infrastructure upgrade of the software and hardware that Google used for Web search.

    Chances are that one of the main changes with Big Daddy was a bigger index, with multiple parts, where some pages were included in the main index with a full range of ranking features collected about them, and other pages where places in an extended or supplemental index, where less information regarding ranking signals where collected for those. Matt Cutts wrote about some of the changes (in a general manner) here:

    Feedback on Bigdaddy data center

    Search definitely is evolving fast. One of the most important things any SEO can do is to try to keep up. 🙂

  21. Hi Trillo Digital,

    Local search definitely has come to the forefront lately, with the Google Place Search change, where local search results are appearing inline with regular web search results rather than in a separate box segmented away from those results. I’m not convinced yet that there was any change in the way those results are ranked, rather the change is in how they are displayed. But that change in display appears to make them more prominent, and that may influence more people to click upon them.

    There’s still some value in location based directory sites, and there are some pretty large ones that have gained some prominence lately, like Yelp, and I’m not sure that they are going away anytime soon. But, I think you’re right that Google may be more than happy for people to visit Google Place pages for businesses rather than a business profile page in one of those directories.

  22. Hi Michael,

    Thank you. It was a fun series to write. To paraphase something Matt Cutts said recently, don’t look at where Google is now, but rather where they are going in the future. Or as Wayne Gretzky said before him:

    I skate to where the puck is going to be, not where it has been.

  23. Hi Dave D,

    Good point. I expect within 5 years or so that the percentage of people accessing the Web through mobile devices will dwarf the number of people doing so through a desktop or laptop computer. Phones are cheaper, easier to carry around, and there are many possibilities for wireless access to become considerably more pervasive. Location based services and applications are growing at an incredible rate as well. Local search and mobile search are tied together in so many ways.

    We’re also seeing the possibility of a future where Web browsing may hold a much more prominent place in the entertainment centers of most people’s homes, with Web enabled televisions and streaming movie rentals.

    What does it mean for search, for searchers, for the Web, and for SEO when people start visiting the Web more frequently on tiny handheld screens or super large televisions? For mobile searches, you’re right that every where you go has the potential to be “local.” It’s something that many businesses and organizations that don’t have much of a web presence presently should be paying attention to – being able to be easily found online by mobile searchers may be a considerable competitive advantage.

  24. Hi Jason,

    Just what are those fundamentals these days. The list at the top of this post from the 1998 paper show some signals that were believed to help in the rankings of web pages, but we’ve been hearing for a few years from Google that they look at more than 200 ranking signals. Microsoft noted in one of their pages about their Ranknet ranking system that they look at more than 500 ranking signals. Those aren’t “trendy SEO tactics,” but rather the reality that search engines have changed.

    It’s likely that the things that I think are fundamentals of SEO are much different than what you think they might be. I was asked to participate in a session at a search conference on “Basic SEO,” along with some others, and we were told to talk amongst ourselves as to what each of us would present upon. Some wanted to talk about the importance of page titles and headings. Others wanted to discusss links and anchor text.

    When I offered some of the topics I wanted to discuss, the others told me that my suggestions were advanced SEO, and that they would be too complicated and too hard for the audience to grasp. Yet, their idea of basic SEO went back to that 1998 list in my post, ignoring things like Universal Search and optimization for news, video, local search, blogs, and others. Today’s SEO basics are very different from yesterdays.

  25. Hi JC,

    I remember the Nigritude Ultramarine competition with Anil Dash, too. Most people on the web aren’t Anil Dash, and don’t have the kind of connections and followings that he does. Anil won the contest in 2004, after the Florida update happened, and most likely won on the strength of links from other bloggers more than anything else – lots of on topic links can make a difference.

    A book on the history of search and SEO might be pretty interesting, and fun.

  26. Hi Mark,

    One of the most challenging things about SEO are the changes. They also happen to be what makes doing SEO so fascinating.

    I agree with your point about Google sometimes trying to do your thinking for you, and guessing your intent without giving you the chance to tell it that it’s focusing too much on something that you aren’t interested in.

    I mentioned the Applied Semantics approach to meaning based search because the method that they present in their patent on web search involves a more interactive type of search where you might be asked something like “Do you mean images from the movie ‘Avatar’?” to give you the chance to either focus upon those, or filter them out of results. I’m guessing that we might see more of that kind of query refinement type questioning in the future.

  27. Hi Kevin,

    The early days of SEO were fun and challenging, though sometimes the challenges were things like designing a template to generate bulk invoices from quickbooks, rather than doing keyword research or tweaking HTML. I don’t miss making global changes to HTML on purely static websites – a lot of work.

    I had someone working with me who would sometimes read about something related to tactics involving in rankings in search engines, and make changes, like adding a few lines of white text on a white background, not realizing that the search engines might take offense at the practice.

    A number of the signals that search engines look at still remain effective because there isn’t much doubt about them. Keywords in page title – the title should be what the page is about, so that makes sense. Keywords in URLs – if the title and the URL both have the same keywords, and those appear on in the body text of the page as well, they begin to make a strong case about those keywords. Primitive signals, but more dependable than many others. Search still does have a long way to go though.

  28. Hi Jeremy,

    I have mixed feelings involving some of the “SEO is Dead” posts that do spring up from time to time.

    I love seeing posts that challenge and criticize the SEO industry in thoughtful ways, that ask us to become introspective and look closer at what we do, and how we present ourselves, and how we might meet certain challenges.

    For instance, when Google started introducing personalized search, so that every search might see different search results for the same query, there were a lot of posts that questioned how people involved in SEO might respond. Lately, there have been a few posts that see the growing influence and impact of social media and social networking, and the popularity of sites like Facebook, and how that might impact SEO.

    There are people who say that the true test of character isn’t necessarily how you act and interact with others everyday, but rather when you find yourself placed in awkward situtations, when you’re challenged or criticized or slighted. How you respond to those types of situations provides some insight to you and who and what you are.

    Many of the “SEO is Dead” type posts we see are from people who don’t know very much about SEO to begin with, or who don’t know much about search, search engines, and how those are evolving. A number of them are purposefully written as efforts to draw traffic, and often paint professional SEOs with the same broad brush as email spammers, website hackers, and people creating very low quality made for adsense type sites.

    I actually think there’s value in those types of posts if they can inspire people within the SEO industry to discuss how to make the industry better and to improve the quality of what we offer to site owners, and to talk about things like conversion rates, analytics, and user-behavior. The days of looking at pure rankings related to SEO are over – looking at conversions, usability, calls to action, and bounce rates (where and when they have some value), have entered into the realm of SEO, which is a welcome change.

  29. Hi Chande,

    I think we have seem search engines becoming more recommendation systems over the last 5 years or more, and that’s something I’ve been pointing out for a while.

    One of the standard stock statements we see in interviews from search engineers, and in articles about search engines is an emphasis on making changes to search engines to provide the “best user experience to searchers.” I think in the majority of cases that’s true – that the search engines are responding to changes in the way people search, whether real or perceived as such. For example, social networks like Twitter and Facebook have grown tremendously, and the search engines have been develping faster indexing of those systems, started displaying results from them more quickly, and have made it easier to find information from people who you might be connected to on those types of networks.

    I agree with you to on the new previews in Google, too. The Instant Previews will likely impact the look and feel of websites who want to receive traffic from Google, and I think it illustrates how search engines also play a role in how the Web evolves.

  30. Hi Randy,

    I do think that it’s an important discussion on how social media and search might intersect, and influence each other. I’m not sure that discussion should be couched in terms of “SEO is irrelevant,” or “SEO is now Dead, and the New King is SMO.” The search engines have been looking at social signals for years, in user generated content, in forums and blogs, in Flickr and Facebook and Twitter, and in many other areas where the barrier to entry to publishing your thoughts and opinions and information about the world and the Web has diminished tremendously. SEOs have been looking at those signals as well.

    On another note, I remain somwhat surprised that user experience metrics do not seem to be a bigger factor in the algorithm. Any developments in this direction?

    The language is pretty technical, but Google’s patent Model generation for ranking documents based on large data sets is one of many from Google that looks at how user behavior could be used to influence rankings from the search engine.

    Here’s a snippet from that patent.

    For purposes of the discussion to follow, the set of data in repository 220 (FIG. 2) may include multiple elements, called instances. It may be possible for repository 220 to store more than 50 million instances. Each instance may include a triple of data: (u, q, d), where u refers to user information, q refers to query data provided by the user, and d refers to document information relating to documents retrieved as a result of the query data and which documents the user selected and did not select.

    Several features may be extracted for any given (u, q, d). These features might include one or more of the following:

    – the country in which user u is located,
    – the time of day that user u provided query q,
    – the language of the country in which user u is located,
    – each of the previous three queries that user u provided,
    – the language of query q,
    – the exact string of query q,
    – the word(s) in query q,
    – the number of words in query q,
    – each of the words in document d,
    – each of the words in the Uniform Resource Locator (URL) of document d,
    – the top level domain in the URL of document d,
    – each of the prefixes of the URL of document d,
    – each of the words in the title of document d,
    – each of the words in the links pointing to document d,
    – each of the words in the title of the documents shown above and below document d for query q,
    – the number of times a word in query q matches a word in document d,
    – the number of times user u has previously accessed document d, and
    – other information.

    In one implementation, repository 220 may store more than 5 million distinct features.

    While that list only provides a small sample of the kinds of information that a search engine might look at, and includes a lot of information about the words that appear on pages and in links and queries, it also implies that other kinds of information related to user interaction might also be collected, such as possible dwell time on a page, whether a page is bookmarked or saved or printed, and much more.

    All of the search engines have been actively collecting information about how people use search engines, and the web itself, and it’s possible that the amount of user-interaction data they’ve collected dwarfs the amount of information in their index about the content of web pages themselves. The challenge may not be so much in collecting that data either, but rather in how to use it.

    Storing information about specific triplets of information involving users, queries, and documents (u,q,d) is a step towards building a recommendation engine that considers rich profile information for users (and groups of users), websites, and queries, and user interaction metrics plays a strong role in using that data.

  31. Hi EPC,

    You’re welcome. I noticed a number of years back how I could find a lot of new questions to ask about how search engines might work by keeping an eye out for primary publications from them like whitepapers and patents. Sometime there were some answers in those too, but often the questions they raised or implied were more interesting.

  32. Hi Alberto,

    I think that’s really a very important question – how will search and social media evolve together. I also think it’s something that people at the search engines have been exploring for a few years now already. It’s definitely worth probing and questioning and exploring and testing. I agree that’s its going to be huge, and it some ways, it already is. The web is growing and changing and evolving, and unlike a book, the conversation goes on in real time.

    In some ways search engines are a way to find repositories of information, and in other ways, they act as ways to find ongoing conversations, and interact.

  33. Hi Joe,

    Interesting points.

    One of the things that tends to mystify me when people write about search engines and relevance is that they cast together a lot of different things under one umbrella when it comes to the term “relevance.”

    A few different kinds of relevance:

    – A web page might be relevant for a query when both the page and the query contain the same words.
    – A web page might be relevant for a query when both fall under the same classifications, even if the same words aren’t used by both.
    – A web page might be relevant for a query when the page fulfills an information need intended by the query, again, even if the same words aren’t used in either
    – A web page might be relevant for a query if it provides a response that fulfills a situational need, once again, even if terms in the query and web document don’t match.

    Word proximity and frequency were pretty helpful for finding relevant pages back in the 90s when search engines were focusing upon keyword matching, and trying to distinquish between pages that all contained the keywords (or most of them) from a query.

    A lot of approaches have developed, and have been developing that look at other information. For example, a statistical machine learning translation approach that uses language models to do things like translate a query from one language to another, and than back again to try to find words with similar meanings with a high level of confidence.

  34. Hi Seo agency,

    I believe that a lot of the ideas behind new developments at Google and the other search engines really shouldn’t come as too much of a surprise.

    The ideas behind Google Instant have been percolating at Google for more than 6 years – they describe it in a patent filed in 2004, so it’s not something that’s really all that new. Previews like the ones from Google Instant Preview have been available at sites like ask.com for years as well.

    Finding hints about what might come, and being able to adapt to them is a valuable skill for anyone engaging in SEO. Asking yourself a lot of whatif questions, such as “what if back links lost their value in search rankings completely” is a good question to ask, and to prepare for. Afterall, Google’s deal with the exclusive right to use PageRank is supposedly expiring in 2011. What does that mean for website owners? 🙂

  35. Hi Martin,

    Thank you very much. You raise some great points.

    The term “phrasification” isn’t one that I coined or created, but rather comes from Google’s patent. From the abstract itself:

    Phrases in a query are identified based on possible phrasifications.

    Regardless of whether either of us finds it backwards, I think it’s important to get an idea of some of the ideas and assumptions that come from the search engines, and the possible methods that the search engines might use to identify phrases as they appear in queries is something worth taking note of.

    To a degree, I can relate with Joyce. I don’t get a lot of visits from people searching for terms like “phrasification,” but I find writing about topics like that worth the effort, both from the stance of me learning while I explore topics like that, and being able to share and educate, and hopefully entertain to some degree. I’d write about things like phrasification even if I didn’t have any visitors to this blog. 🙂

    I probably will write more about phrase-based indexing.

  36. Guess I should have actually taken time to read the patent but I was hoping keeping an eye on your blog would be enough 🙂

  37. Hi Bill,

    thanks for the insights and comments. I believe that Google in the near future will develop, or at least try, to develop a recommendation system similar to Facebook “likes”. I’ve seen great shifts in traffic on sites that have many likes, and what is the original PageRank algorithm than popularity?

    Add this to social/niche background of the players (for example, people in academic circles liking some pages with great research papers more then others) and you have (from Google’s point of view) got yourself ridden of spammers. But, here will Google probably play on the relevancy vs income part of the game 🙂

    Cheers

  38. I’ve read over at another blog about seo being undead and a picture of a zombie was featured. And I think zombies are that best things that can describe seo. You think it’s dead, but it’s not, it’s still alive and it’s moving. Great post!

  39. Hi Bill, (sorry for my English)

    thank’s for this great serie of articles.
    I use this comment because i would like to hear you about the and the use it can be done for information retrieval.
    I know my question is stupid : everybody knows that a good is very important. Ok, but i’d never hear other thing than : olala you don’t know that ! Ouuuuh, this guy is a seo practionner and he asks question about the title, and so on …
    But in facts ???

    Thank’s a lot.

  40. I think the next major influence (outside of search technology) will be the inevitable destruction of net neutrality and often contemplate how it will impact the industry.

  41. Thanks Bill for your history lesson and up dates on how SEO has Evolved. The key for the search engines should remain the same and that is to provide content that satisfies their searchers needs. We just need to be in position to accomplish this by offering great value and solutions to problems that makes this a win for the search engines, searchers, webmasters and advertisers.

  42. Thank you Bill for a great series. As things get more and more complex, sometimes it’s easy to miss the forest for the trees. I find that if I read through the patents, and then spend a few days shut off from everything doing nothing but learning and practicing Google’s advanced search functions, doing batches of single word and multiword long tail tests, I start to hear what Google is trying to tell me. The instant search results along the way are yet another whisper.

  43. Bill thanks for perspective, you seem like you witnessed the creation, the freezing and refreezing of SEO. Although SEO has only been around 14 years, roughly, the growth in this industry is unbelievable. There is so much competition nowadays to not only build communities and quality content, but rank for keywords and phrases.

  44. Hi Martin,

    Thanks. It can take a lot of time to create a blog post around a patent, even if I focus upon just the highpoints, and try to give a flavor of the ideas going on within one. I wish I had time to do very thorough and complete analysis of each patent I come across, but that would take even longer.

  45. Hi Chande,

    It’s possible that Google will develop an explicit recommendation system that is similar to Facebook’s likes, but their search results themselves are evolving into a recommendation system based upon which sites certain viewers visit based upon specific queries used. Something along the lines of “people similar to you who searched for such&such viewed the following pages.”

  46. Hi Andrew,

    I think I may have seen that post, too. SEO is constantly changing and evolving, but I don’t think it’s in any danger any time soon of disappearing.

  47. Hi Matthieu,

    Thank you. I’m not quite sure of the question that you are actually asking, but I do believe that it can be really helpful for someone who is an SEO to spend time with information retrieval, learning as much as they can.

  48. Hi Julian

    The disappearance of Net Neutrality does have the potential to disrupt the web as we know it, and not just search. I hope that people challenging it think carefully about the impact that they might have. The harm they cause may be much greater than the benefits they receive.

  49. Hi Bill,

    Thanks. I agree with you, but one concern that I have is when the search engines may think that the changes they make are for the good of everyone when they may in fact not be.

  50. Hi Eric,

    Thank you for your kind words.

    It can take a while sometimes for something that I read about it a patent to impact me as well. There’s nothing like spending some time just doing a lot of searches, and observing and exploring to help the possible impact of one of those patents sink it. Sometimes patents provide better questions than answers – and it takes that time to get a sense of how those questions might be answered.

  51. Hi Matt,

    I do, and have spent a lot of time researching and experimenting with SEO. One thing that I think still holds true, even as competitions seems to have increased, is that creativity, innovation, and a unique approach can be very helpful for those who want to succeed on the Web.

  52. It’s very interesting how SEO has evolved over the years. It has changed in nature, relevance, and has made Google, a simple search engine, into one of the world’s most relevant and powerful corporations. I enjoyed reading this post and all the replies.

  53. Bill, as usual you have hit the nail on the head. Like a previous commenter, I got goosebumps about this article.I started “optimising” websites a year after Google was formed and the changes are now immense. Our SEO Company has seen many changes in that time – some good, some not so good. As discussed in previous articles i have written, I just wish Google would stick to improving algorithmic changes and not mess around with instants and the like. If they did that i think SERPS would be a lot more stable and our lives as SEO’s would be much easier, don’t you think?

  54. HI Chris,

    Thanks. At this point I’ve come to expect change from Google and the other search engines. If they didn’t change around as much as they do, SEO would be easier – for everyone. Since they do make changes often, that can sometimes be an advantage for people who keep up with the changes, and sometimes even anticipate them.

  55. I have noticed that SEO experts that concentrate on Social Media Optimization (SMO) are getting on top of Google search results quite easily. I believe that this is an area that needs urgent attention or you risk falling behind in ranking.

    John

    SEO Expert and Marketing Manager

  56. Great post,

    Whats great about the development of SEO is that its getting more logical and easy to understand. Even people with no technical education can now preform seo tasks and get results.

  57. Hi John,

    I’m not sure that there’s a correlation between how SEOs use social media and rankings in organic search. There are a very wide variety of signals that the search engines use to rank pages for certain queries and keywords. The use of social media may help with things like attracting attention and traffic to pages, and increasing the numbers of bookmarks and links to those pages as well, but there are many other ways of doing that as well.

  58. Hi Sturla,

    I’m not sure that I would say that SEO is getting any easier, or any more logical. There are many basics of SEO that people can follow that can help get positive results, but there are other aspects of SEO that it can be really helpful to have a deeper understanding of to take advantage of.

  59. What a shock! I’ve been reading a number of your blog posts after an email referral from Howie ‘The Shark Never Sleeps’ Schwartz. It’s like a time warp from my Boolean days at Dialog Information Services working as a Law and Government Info Spec. inc. Patents and Trademarks. It may sound strange but I felt like a farmer away from the farm coming back to feel and smell some fresh turned soil. Rolling it around in his fingers – maybe even tasting it. How refreshing it was to read your perspective on search and the insights one can gain from studying original documents like patents (and supporting documents?). I thank you my new search source friend. May you continue to share your most appreciated investigations and insights.

  60. Hello Bill,

    Thanks for keeping us informed of what is going on in SEO.

    I like the semantic matching. Makes things easier.

    One thing (of the many) that bothers me about search is the value of popularity.

    If a site is relevant to a search, it is relevant.

    The amount of incoming links should not matter.

    More links do not mean the site is more relevant or that the quality of the information is better. More links often just mean someone did more marketing.

    Google overly rewards the big guys who do the most marketing for the most popular search terms.

    It would be fairer, if google rotated among all the most relevant results and did not favor one website over another because of more links.

    What would be interesting is if google offered users several different ranking algorithms.

    Andrew

  61. Bill, I would like to say at first that I have no credentials with anything to do with SEO, computer engineering or IT applications and only that I am a lay person when it comes to the discussions I read on this blog. I am a business owner who wishes to utilize the internet and its various portals to improve the marketing of my products to the huge market base the internet reaches. Enough said about that.
    I appreciate reading your posts and although more times than not your thoughts and comments are heading towards destinations I can not fathom. However you have a gift of putting the basic concepts of your discussion subjects into a form where I can glean the knowledge very relevant to what I am trying to find out into a form that I can comprehend. (thanks!) I take notes from many subjects you write about that help me in my marketing and SEO efforts. (Also there is a side of me that just likes to try and figure out techie stuff.)
    I have never posted a response before as I feel inadequate to add anything of worth to technicians in a field that I have no expertise about. How ever a rather short reply from another person several days ago, Julian Young, hit a nerve. And it scares me a little. I know that in general this is not technical and you may not want to expand on something that, at this time may only be speculative, and that you may only be able to express an opinion about, that is, “net neutrality”. I read your response to Julian and it really got me to thinking about what messing with this could do. It could shake the very core of how things work commercially and socially on the internet. Is my alarm justified or is it something that can be adapted to technically? Am I wrong or could it upset drastically the way SEO and such things are done? This may not be something you wish to delve into in this blog I understand this, but your opinion may carry more influence than you realize should you wish to express it. Thanks again I look forward as usual to reading your future blogs and comments. Please keep up the good work!

  62. Bill,

    Thank you for continuing to provide an in depth discussion of SEO. Google continues to evolve and although SEO is not new, search engine capabilities are advancing. A great example of this is in how videos are read and ranked by Google. The new addition of video site maps and real voice soundtracks are a good indication of tech advancement as well as algorithm changes.

    Shira
    Internet Marketing Duru

  63. Hi Andrew,

    In the days before link analysis systems like PageRank and HITS were developed, when you performed a search at a search engine, it would attempt to match your query terms with terms it found within documents. It wouldn’t do the best job in the world of ranking those is some meaningful order, and chances were that the documents or information you might be trying to find would be buried pretty deeply within the search results.

    Looking at the quality and quantity of links to a page, so that a page is ranked both by the content upon it, and the links pointing to it really is an improvement over just matching keywords in queries to pages. There are flaws in a link analysis approach, and marketing does often play a role in how often something is linked to, but there are even more flaws in relying upon the content of a page by itself, and systems in the past that relied just upon an approach like that were prone to rank highly pages that used tricks like hidden text and keyword stuffing in meta tags and content.

    Google does look at the content on pages, and the links pointing to pages as part of the ranking process, but it also has developed a number of other approaches to ranking and reranking pages as well. One example is the phrase-based indexing approach that I mentioned above in this post, and there are many others as well. Google tried to understand if a query is transactional, informational, or navigational, and may rank pages differently based upon that analysis. Google may also attempt to see if there is a geographic intent behind a query, and if there is it might show results that have something to do with a specific location, possibly were the searcher may be located. There are a number of different ranking algorithms that Google may decide to use based upon things such as how they interpret the intent behind a search, or based upon previous queries that you may have entered recently.

  64. Hi Val,

    Thank you very much for your kind words. I look forward to seeing you around, and to interacting with you here in the comments and discussions about things I might post.

  65. Hi Paul,

    Thank you for leaving a comment, and sharing your thoughts. I strive to try to make many of the ideas that I come across in the patents and papers that I write about as accessible as possible, while still keeping some of their flavor. It’s really good to hear from someone who doesn’t have a technical background telling me that they are learning something new and useful from my posts.

    Net Neutrality is something that has been around on the internet for a long period of time, without being expressly legislated by anyone. But, with companies like Comcast in a position where they can control some of what is being transported on the Web, and competing with some of the other businesses in areas like distributing movies, there’s a potential for the neutrality of the net to be threatened.

    It’s a topic that I’ve thought about writing for some time, and there’s a hearing on the topic today in front of the Federal Communications Commission that might influence what we can and can’t see on the web.

    Net neutrality: US expected to ratify new rules on internet access

    I may have a post on the topic either later today, or tomorrow.

  66. Hi kipesiva,

    I’m not sure that the Length of a Document is that important a consideration in ranking pages. While it’s great to have a certain amount of content on a page to make it more likely to be indexed for terms appearing on that page, chances are that the search engines attempt to “normalize” the length of a page so that it doesn’t unduly influence how well or poorly a page might rank for words upon it.

  67. Hi Shira,

    Those are good points – thank you. As the web evolves, and searchers become more interested in things like finding videos, the search engines strive to make them easier to find, and SEO necessarily has to include learning more about how a search engine might rank videos.

  68. Bill,

    What an interesting perspective you have being involved with search engine happenings since the early years. I don’t have quite as much experience, but I think the discussion here points to Google becoming increasingly effective at recognizing real value.

  69. Hi Nick,

    Thank you. Experience is a great thing to have, but it also helps to have an open mind, to want to experiment and learn new things. Google’s roadmap to the future does lay in them being able to offer better value, and better experiences to the people who use their site. They’ve been offering better tools for site owners and web publishers and the designers and developers and consultants who work with them, like Google Analytics and Webmaster Tools. They’ve also become more transparent about many things, including much better communication on places like their blogs, help pages, and help forums.

  70. Hey Bill,
    Really awesome tips and observations. I am in my first inhouse seo position and even today i struggle to describe my actual job title. This definitely helps. believe it or not they had me write my own job description in my contract. haha. great blog, bookmarking for further use.

  71. Hi AJ,

    Thanks for your kind words. The first in-house SEO position I held was part time, and I didn’t have an official title until about 7 years in.

  72. Hi Bill.

    Just wanted to say Happy New Year to you and hope you had a nice Christmas?

  73. Hi Chris,

    Thank you. Christmas was nice, with a chance to see some people whom I hadn’t seen in a while. I hope your holidays were good ones as well.

  74. Excellent post; it really shows that the world of SEO evolves really fast.
    Thanks for sharing this info.

Comments are closed.