How Google May Identify When Sites Transform into Doorway Pages

You go to a site that you’ve enjoyed and bookmarked sometime in the past but haven’t visited in a while, and it’s changed. The topics it discusses are different, or the writing style isn’t quite the same, or it suddenly has links within its content to commercial pages that it probably wouldn’t have linked to before, or all of those things. It also seems heavily focused upon more commercial terms and content. It’s changed, and now its pages now have the appearance of what many might call “doorway pages.”

Doorway pages have also been referred to by terms like gateway pages, entry pages, bridge pagers, portal pages, and their primary purpose is to attract visitors from search engines in order to send them to other places.

As a site owner, you don’t want Google to start identifying your pages as doorway pages. Google’s Webmaster Guidelines tell us to:

Avoid “doorway” pages created just for search engines, or other “cookie cutter” approaches such as affiliate programs with little or no original content.

Doorway pages tend to have fairly low quality content, and are written primarily to rank well for specific terms or phrases within search engines for the purpose of funneling traffic to another destination. A Google patent application published today describes how it might identify pages that have been transformed into doorway pages to point searchers to other sites.

The patent filing is a little unusual in that its what is known as a Divisional patent, which means that it contains material from a previously filed patent, but focuses only on one aspect of the patent, which could be seen as a separate invention.

I wrote a few days ago about Google resuscitating a patent originating from their Historical Data patent, which caused a big stir in the mid 2000s, with my post, Revisiting Google’s Information Retrieval Based Upon Historical Data. Google filed a divisional patent application last week, Document Scoring based on Document Content Update based upon the Historical Data patent.

A new patent application from Google published at the USPTO today that has the same name as the divisional patent filed last week, as well as the same description, but the claims contained within the patent application are very different, and focus upon pages that may have transformed into doorway pages. The patent application is:

Document Scoring based on Document Content Update
Invented by Anurag Acharya, Matt Cutts, Jeffrey Dean, Paul Haahr, Monika Henzinger, Urs Hoelzie, Steve Lawrence, Karl Pfleger, Oclan Sercinoglu, and Simon Tong
Assigned to Google
US Patent Application 20110264671
Published October 27, 2011
Filed June 30, 2011

Abstract

A system may determine a measure of how a content of a document changes over time, generate a score for the document based, at least in part, on the measure of how the content of the document changes over time, and rank the document with regard to at least one other document based, at least in part, on the score.

The new claims focus upon instances where the content of a page changes so much that it has quite possibly become a doorway page. The patent description tells us in one section what Google might be keeping an eye out for:

Document Topics

[0114] According to an implementation consistent with the principles of the invention, information regarding document topics may be used to generate (or alter) a score associated with a document. For example, search engine 125 may perform topic extraction (e.g., through categorization, URL analysis, content analysis, clustering, summarization, a set of unique low frequency words, or some other type of topic extraction).

Search engine 125 may then monitor the topic(s) of a document over time and use this information for scoring purposes.

[0115] A significant change over time in the set of topics associated with a document may indicate that the document has changed owners and previous document indicators, such as score, anchor text, etc., are no longer reliable.

Similarly, a spike in the number of topics could indicate spam. For example, if a particular document is associated with a set of one or more topics over what may be considered a “stable” period of time and then a (sudden) spike occurs in the number of topics associated with the document, this may be an indication that the document has been taken over as a “doorway” document.

Another indication may include the disappearance of the original topics associated with the document. If one or more of these situations are detected, then search engine 125 may reduce the relative score of such documents and/or the links, anchor text, or other data associated the document.

A change analysis might be performed on a page to see how topics previously associated with the page might have altered, by looking at changes for that page based upon the categories that would be associated with it, an analysis of the links pointing to and from the page, changes to the page’s content, whether different pages might now be associated with it if it were clustered together with similar pages, and how it might now be be summarized using a document summary approach (like is done for the creation of snippets in search results).

If a page might have ranked well for a specific topic in the past, and content about that topic has been removed, Google might take that as a sign that the page is now webspam.

If there’s a spike in the number of topics associated with the page, that could also be a sign that the page has now become a doorway page.

So, for example, a site with pages about fishing where its pages lose content on fishing related topics, and now include information about topics that aren’t very related such as weight loss, or travel might be perceived as having become doorway pages.

Conclusion

When the historical data patent came out, it read like a grab bag of loosely related ideas on how Google might identify stale pages, or pages that tranformed over time to become web spam, so it’s not a surprise that we’re starting to see the ideas and processes from that patent being split up and refiled as divisional patents.

I’ve lost a few bookmarks to sites that I’ve saved to changes made by new owners who have removed old content and replaced it with unrelated new content or filled it with advertisements. People do make a practice of buying older sites and changing them, sometimes with the motivation of making improvements, and sometimes inspired by being able to funnel the traffic those sites were receiving to other pages.

People also sometimes create fresh doorway pages, but it’s not unusual for doorway pages to be built upon older content found at sites that might have been abandoned, and then sold.

It’s also possible that some site owners might decide to change and update the content on their sites, to focus upon different products and topics, and to launch different services on their pages. It might not be a bad idea to do some of that on new pages with different URLs.

Share

55 thoughts on “How Google May Identify When Sites Transform into Doorway Pages”

  1. Buying old sites, especially ones with “grandfathered” PR is huge in the internet marketing industry.

    I know people who buy these constantly and perform 301 redirects to the root index AS a 404.php

    This essentially funnels the link juice, trust and authority from any and all links still out there pointing to any pages that have historically existed on that domain…all to the root index.

    Interesting trick. I can only assume that that type activity would only increase the possibility of being “tagged” as a doorway page…either manually or algorithmically…

    Mark

  2. Very interesting. Looks like the jig is up for the majority of link brokers buying domains with backlinks and PR to use for selling links. I’ve seen a large number of them fall (deindexed, lose PR, etc.) in the last year and this might be a reason why.

  3. Interesting info again. It seems logical from their standpoint, and mine. Content is constantly changing and Google has no reason to believe that all changes are innocent. In fact, as Mark said, these types of domains live on for years. This is one more reason to make sure your content is well-linked before adding additional content, IMHO.

  4. From all my observations – the AGE of a domain is clearly NOT a highly-weighted ranking factor – when there are so many other site metrics that determine the real strength of a website. It also takes a tremendous amount of webmaster skills to _preserve_ an aged site’s link structure, topic, and theme intact to be of much worth moving forward. Otherwise it only takes a PageRank update or three for the back link juiciness to drain right out of it. Add Link-Decay to that, and a flipped domain’s value can decline rapidly. As such I’ve never been a buyer in the ‘instant gratification’ flip market: You’re really only buying the past – with no guarantee of the future in many instances.

  5. Hi Bill,
    It does seem that all too often people want something for nothing. It is a shame that these old sites can’t be put to use in a way that would be acceptable and not a quick backdoor for someone to run a scam through. Google seems to take their policing job seriously. That is a good thing for us. I enjoyed your article.

  6. Domain age is not very important.. Constant link building and social media campaign keep you ‘alive’ and busy…..

  7. Interesting, I guess the advice is to make sure if you are adding content to older pages you stay within the on-page topic. Where in the past you may have looked for an old strong domain or page and tried to leverage the weight for a new topic, that may not really be the best way to do it. As you have said, if you want to launch something new then create a new page or section for that on your site.

  8. It’s pretty interesting to me, because I still some some obvious “gateway and bridge” pages around that are ranking well for competitive terms. Maybe panda hasn’t come around and got them yet. But, its obvious if 90%+ of your content has CPA links and ads, with low content and spelling errors. I think Google is cluing into a lot of those methods better and better, working out the kinks and such.

  9. I have always found the categorization of pages with affiliate links as ‘doorways’ strange. Some product reviews that contain affiliate links do a better job in represent the product than the site selling it! So, I feel that categorizing all pages with affiliate links as doorway pages is not very accurate.

  10. atm u get a panda slap if you have 3 posts all of different topics – this is calculated in densities of high competitive KW – if Google finds “web design” and “car rental” on same page – classic slap! wake up (x color) hats! need new ideas lol

  11. I think this is a very positive step from Google, It indicates that Google gradually appreciate hard word as oppose to using various trick to achieve “Unearned” ranking. The Old domains issue is a bit tricky, as I understood you, it gets dangerous to use an old domain, what if you finally manage to get a hold of the domain, which best fits to your business, would you then recommend not to use it at all or would e.g. a 301 redirect to your existing domain be ok?

  12. Interesting reading. There’s a lot of this activity on this from SEOs around the world but if the content is of a decent quality then I don’t see it being a problem. If it’s poor quality, then you get what you deserve really.

  13. It’s good to hear that the days of turning around what used to be solid and content rich sites into a profit driven platform (gateways)might soon come to an end. For sure, it is the honest webmasters that need to be sure they don’t do anything that could be considered a similar practice.
    Creating new pages and new urls does seem the right way around this as mentioned above. I wonder though, if the problem is related to the homepage and not deeper pages, is the only option then to use a new domain?

  14. Very informational post, but what about websites linking to their other important posts within content? I have noticed they often promote or link to pages that are not relevant. These are not considered as doorway pages! What about heavy inter-linking?

  15. It would be nice if they were penalized for 1 main reason, there are only a finite number of decent domains names. If a company/person does not want it any more that someone else should be able to get hold of it for legitimate purposes. companies should not be able to horde domains that are nothing to do with what they sell.

  16. One of the most famous people to have door way pages is probably BMW and for a while google kicked them off there search engine. From what i have read on the blog scene they put loads of pages with second hand cars then pointing them to the BMW site. I suppose in a way google must of found it by thinking what does second hand cars have to do with brand new ones.

  17. Hi Mark,

    I suspect that we will see the value of buying old domains to redirect them and gain value from them will probably decrease in the future.

  18. Hi Brett,

    I suspect that Google is paying more attention to these kinds of practices, and will do even more of that in the future. This kind of practice for building links has the potential to have the bottom fall out of it at any time.

  19. Hi Darren,

    I’ve always been somewhat cautious about making changes to pages that change the character of those pages too much, or too quickly, and this patent reinforces that concern.

  20. Hi Glenn,

    I agree with you about the value of the age of a domain, and I think I’ve seen some place a little too much emphasis upon that. There are a lot of little signals that a search engine might look at that can bring it to devalue the worth and weight of an old site that might have been purchased to use as a doorway or a redirect that it’s not something that I would be willing to rely upon as a long term tactic for success.

  21. Hi Ann,

    Thank you. I often see approaches like flipping sites to use them as doorways to being similar to building a house out of a deck of cards. You never know when a strong wind might come along and blow down the whole thing.

  22. Hi Simon,

    As well as building pages that people find interesting, engaging, and useful enough to refer others to, link to, and so on.

  23. Hi Stuart,

    If Google is constantly monitoring changes made to pages, looking for topics that disappear or a sudden increase in the types of topics that might be added, and that seems to be the focus of this patent filing, then yes, keeping within topic seems to make sense.

  24. Hi Raj,

    That’s not what this patent is pointing at. And, if you managed to get a look at the recently leaked Google Evaluator’s handbook, you may have seen their discussion about affiliate pages and how they state that thick affiliate pages that add value independently of their affiliate links aren’t spam.

    But imagine the scenerio where someone buys a site that’s been around for a while, adds a lot of new content and removes old content, includes a number of links to its pages whether affiliate or not (I don’t believe the patent even mentions the word affiliate), and that’s what the search engine is looking for. As it noted in the section that I quoted above:

    A significant change over time in the set of topics associated with a document may indicate that the document has changed owners and previous document indicators, such as score, anchor text, etc., are no longer reliable.

  25. Hi Ron,

    Good point. If a page is a news site, then the search engines know that it might have multiple topics displayed on a regular basis, but if a page is about the best way to landscape your yard over a period of years, and then it suddenly contains information about how to win at poker, the best way to loss weight, and how to get discounts on hotel stays, that’s a pretty strong signal to Google that it should no longer put as much weight into the anchor text pointing to the page about “landscaping.”

  26. Hi Morten,

    If you find a domain that matches your business well, it makes sense to use it. Just don’t rely too much upon old links and anchor text that may have been pointing to it in the past. I’d also recommend that you look and try to make sure that the previous owners weren’t doing things that might have gotten the domain penalized in the past either.

  27. Hi Paul

    Regardless of the quality of content on a site, what this patent is pointing to is the risk that if you make changes that Google sees as a change of ownership and site purpose, and it seems that you are using it as a way to direct people to some other unrelated place or places, then you may lose the value of links, anchor text, and rankings that page may have had in the past.

  28. Hi Eliseo,

    There is a real possibility that someone might do something like buy a site that’s related to their business that has been abandoned because they think the people who visited the old site might find their site of value as well. For example, one of my favorite old sites was a humorous legal guide to the Web. If a site like Nolo Press bought the site, and maintained it, and added some related updates and some links to their site, would they have a problem with the way this patent is written? I don’t think that they would, but they should know that there’s a potential risk.

    If instead, a site that sells auto parts bought the old legal site, and started including a lot of links to their site from it and add unrelated content to many of the pages of the site, the older site might begin to lose its link value and anchor text value under this patent.

  29. Hi Ahmad,

    This patent really doesn’t cover the idea of a single site interlinking to other pages of the same site, but if a site that has been around without much change on most of its pages for a while suddenly starts providing unrelated links from a lot from those older pages to pages on other sites, that might be the kind of signal that this patent is looking for.

    One of the important elements here has to do with changes that happen over time. Should older rankings for that page based upon old links and old anchor text and changes to the topics that it might have been categorized under change when the page does? It would seem to make sense that they do.

  30. Hi Kevin,

    The BMW scenerio seems to be a pretty good example of what Google thinks about doorway pages in general. I don’t recall if those pages were newly created pages or ones that existed on another site, and were suddenly added to that site. This patent focuses pretty much upon pages that are repurposed to become doorway pages, and how Google might identify those.

  31. Hi James,

    I agree with you that it would be great to see more domains become available when the people originally using them decided that they no longer wanted or need them, but I can also understand why someone who owned a domain like “business.com” might prefer to sell it rather than let it expire.

    The tough part of that is that a change in the ownership of a site or domain isn’t necessarily a change in the business behind the site, or the type of business that the new owners might conduct, and sometimes people do acquire domains that are related to what they sell.

    In this patent, Google appears to be willing to wait to see if new content appearing on a site and/or old content disappearing signals enough of a change so that it seems the reason for making the purchase had little or nothing to do with the actual domain name, and more to do with trying to use things like the rankings of the pages of that site to send people to something elsewhere that may be completely unrelated.

  32. Hi Bill,

    Great post, but i had a question. What if i 301 redirect those old pages to new pages? Please note that the old pages had content of lets say category 1 and the new pages have content belonging to a whole new category i.e. Category 2, on one hand i am transferring all the SEO value and on the other hand as per this latest patent i am running the risk of presenting myself to Google as a doorway page/website.

    Let me know if i am missing out on anything.

    - Sajeet

  33. >>And, if you managed to get a look at the recently leaked Google Evaluator’s handbook<<

    Bill, would you happen to have a link? I did a quick search and couldn't find it on the web.

  34. This takes “neighbourhoods” to the next level, given we already have them with links. Kinda makes sense. But I suspect the patent usage will be more towards PR hyped sites which sell blog posts….by the dozen.

  35. Hi Sajeet,

    When Google crawls sites and comes across 301 redirects, they usually place those off to the side for further analysis. Usually a redirect will pass along most of the PageRank (but not all of it) that a direct link would. Matt Cutts mentioned earlier this year in a video that there is a reduction of the amount of PageRank that is being passed along so that people get more benefit out of direct links than redirects when both are a viable option.

    Chances are that part of the analysis that could be undertaken may involve decisions on how much PageRank to actually pass along. See my post How Google Might Filter Out Duplicate Pages from Bounce Pad Sites for one example of how they might attempt to understand whether or not to pass along PageRank when they come across a lot of redirects on a site.

    This patent itself doesn’t focus upon when a domain is purchased and redirected to another site, but rather when a site might change hands and the content is changed to reflect a new use or to try to gain value from PageRank from links that were pointed to the page when it might have been different and held a different purpose. It’s silent on that issue.

    But that doesn’t mean that Google isn’t looking at the situation that you describe.

  36. Hi jjray,

    If you search under [google raters general guidelines], there’s a good chance that you’ll find a copy of it online. I’m seeing one there.

    I understand that the person who originally found it through a websearch and posted it was asked by Google to remove it and did.

  37. Hi JC,

    This is sort of the situation where people have linked to pages that weren’t “bad neighborhoods,” but may have become that after the fact.

    Chances are that its focus are blogs that have been taken over by others who may be using them to sell links or websites that were bought to funnel away traffic and PageRank.

  38. Hi Ron,

    You’re welcome. The value of a well rounded and holistic approach to internet marketing means that you are less likely to have the bottom drop out on you at any moment in time. Do all the basics in terms of SEO such as making your site easy to crawl and avoid duplicate URLs pointing to the same pages and so on, and then interact with others on the Web in meaningful ways, get involved with social networks, submit your pages to quality directories and business profile type sites, participate with people and businesses in your community and develop relationships that could lead to links, do newsworthy type things that could raise your profile online and off.

    It amazes me when I see businesses that could easily attract links by donating to charities, sponsoring educational opportunities and events online, interacting with other businesses in meaningful ways, producing great resources that many people would link to, and so on, would instead spend the same amount of money buying links.

    Imagine for instance, if Sears started a blog where every other post, or every third post showed us something that Sears offered in its catalog from the 1800s or early 1900s, and told us about the history behind those products, like for instance, prefabricated homes that were shipped out west during homesteading days in the 19th century. They could get historians involved and produce videos for those things as well.

  39. Thanks for approving my theory Bill! I try and stay away from these “elite” networks as I have seen many people drop big-time in the last few month – it all seems great in the first year or two and the ranks increase… but sooner or later ..oops, all gone!

  40. Hi Bill,

    Thank you for the response. So one thing that I understand and would like to conclude is that this new patent focuses more on the category of content in conjunction with the frequency and not just the change in category of content. I believe that incase the domain shifts owners only once then the probability of the new website being considered as a doorway page is less, as compared to a case where shift takes place like ten times, then the domain might be considered as a doorway page.

    - Sajeet

  41. Hi Sajeet,

    You’re welcome.

    It’s likely that Google would also look at how much of a change has take place, in addition to the frequency of change. Keep in mind also that the approach described in this patent wouldn’t happen in isolation either. I mentioned that an earlier version of the patent, which focused upon anchor text pointing to the page as well. If there are a number of links pointing to the page that may have used anchor text somewhat related to the content of the page, and now none of it is close to being related to that content, it’s another signal that the page has changed in some significant ways, and possibly should no longer gain the benefit of those links.

  42. Thanks Bill. I did a google search under “google raters general guidelines” and various results are shown but the links on those pages redirect from the PDF google document to another web page on the same URL (mauriziopetrone.com) that talks about doc but lacks the doc text. All the search results I found also link to the PDF URL that is no longer valid. A few sites do give summaries of the information that the document contained so that is helpful.

  43. Hi jjray and Darren,

    Interesting that you’re having difficulty finding a copy of that. When I wrote my response, I was able to access at least one copy under that search.

    Hopefully you can see the original, because it contains a lot of information that I don’t think will harm Google if shared with a larger audience.

  44. Google using and patenting this algorithm just means that they really can’t judge accurately whether the page is “doorway” or not based on the page itself. If they could, such algorithm would be unnecessary.

    Whatever the definition of a doorway page is, is a page more “doorway” if it has been changed recently?

    Why would a page (that has changed recently) containing multiple topics be a doorway page, but the SAME page (that has not changed) would NOT be a doorway page?

    Just some food for thought.

    Sher

  45. I have the same opinion like some other guys above. I think that DA isn’t that much important than quality and quantity of backlinks and fresh content.

  46. Hi Sher,

    Good questions.

    The patent isn’t so much about finding all doorway pages, but rather finding pages that Google might have trusted in the past that have transformed into doorway pages, and using those changes as an indication of when that might have happened.

    If someone creates a doorway page right off, it might not accumulate a lot of links on its own. But if it’s a popular resource page, for example, that a lot of people have linked to and cited, and someone buys the site that it is on and adds a bunch of unrelated links and information to try to siphon off some traffic and pagerank, it’s likely going to have a much greater impact than the doorway page that’s always just been a doorway page.

    So, rather than trying to find a way to identify just doorway pages that may pass along limited amounts of PageRank, Google describes a way to identify doorway pages that might cause a lot of harm.

    Of course, someone might create a lot of doorway pages that are doorway pages from the start, and use quantity rather than quality to attempt to manipulate rankings. In that case, rather than looking for changes to the content of those pages, it makes much more sense for Google to look for unnatural linking patterns, such as one page being linked to by a very large number of very low ranking pages (and often not much else).

  47. Hi D3nnnis,

    I’m guessing by DA, you mean domain authority. It’s hard to tell if the search engines are using that, and in what manner, but if a page is transformed in a way that might appear suspicious, this patent does seem to indicate that Google might try to understand what is happening and take some kind of action in response.

  48. Bill,

    Point well taken. It is more about backlinks and more about how webmasters award backlinks (and don’t remove backlinks if the target page changes), isn’t it, rather than the page’s content.

    What do you think about this: the backlinks should apply to the page as it was when the backlinks were given. If the page retains say 80% of the topics over time, maybe Google should count that page backlinks to 80% as well, right. (just an initial suggestion).

    Of course, this is just a textual example. One can transfer a page into a doorway page just by changing the images and perhaps outgoing links, so … that is a whole other patent for Google to tackle.

  49. Hi Sher,

    Someone might attempt to change the content on a page to get it to rank well for something else so that it takes advantage of the pagerank that page had attained, and ranks to get visitors to send off somewhere else, instead of just adding links to take advantage of PageRank. That might account for a lot of changes to the content of a page. That might result in the page no longer being relevant for a topic that it once was relevant for, or being relevant for a number of new topics. This particular patent sort of focuses upon that.

    There are other patents that are related, and they look at other things that might change on a page over time, so for instance, when more subtle changes to content on a page are made so that links (often very unrelated) are incorporated into existing text w/o too many changes to content, that sends a different signal that the search engine might look at.

    It is possible that Google might lessen the value of links from a page that’s changed like this, and demote the page in rankings somewhat because of these kinds of changes.

    Each of these patents by themselves aren’t necessarily earth shaking and often just present one small aspect of what the search engine does, and the one that you suggest about how images and links only might be changed would possibly make a good related patent. Google had to file divisional patents like this one on doorway pages because the original historical data patent was just too broad.

  50. Hi Bill,
    Having read this post started me thinking if by using Yahoo generated content via developer means to my wordpress site could this be considered doorway pages? My intentions are to provide a enhanced visitor experience i.e posts, Youtube, Wikipedia that sort of thing. I prefer to think of it as content curation, these are not affiliated links at all just pure content and in some cases free pdfs can be accessed from my site rather than leave the page a visitor is on. Naturally I cannot produce this content at the rate Yahoo can but it is credible content obviously. I also would ask if the content was considered duplicate wouldnt the adsense ads go to public service mode or does that not happen anymore. Just asking.

  51. Hi Richard,

    Tough question. I don’t know if what you are doing might be considered by Google to be curating content, scraping it, adding to what you post, or exactly what, from just reading your description of it.

    There is a difference between copying something, and aggregating it in a meaningful way.

    I’m not sure that you can rely upon the behavior of adsense ads to get a sense of how Google’s indexing system might handle the content you publish either. I’m not familiar with Adsense going into public service mode based upon the content of a page. I just don’t have experience with that, but I wouldn’t want to necessarily rely upon it as a signal of how Google is going to treat your content.

Comments are closed.