How Google May Identify When Sites Transform into Doorway Pages
You go to a site that you’ve enjoyed and bookmarked sometime in the past but haven’t visited in a while, and it’s changed. The topics it discusses are different, or the writing style isn’t quite the same, or it suddenly has links within its content to commercial pages that it probably wouldn’t have linked to before, or all of those things. It also seems heavily focused upon more commercial terms and content. It’s changed, and now its pages now have the appearance of what many might call “doorway pages.”
Doorway pages have also been referred to by terms like gateway pages, entry pages, bridge pagers, portal pages, and their primary purpose is to attract visitors from search engines in order to send them to other places.
As a site owner, you don’t want Google to start identifying your pages as doorway pages. Google’s Webmaster Guidelines tell us to:
Avoid “doorway” pages created just for search engines, or other “cookie cutter” approaches such as affiliate programs with little or no original content.
Doorway pages tend to have fairly low quality content, and are written primarily to rank well for specific terms or phrases within search engines for the purpose of funneling traffic to another destination. A Google patent application published today describes how it might identify pages that have been transformed into doorway pages to point searchers to other sites.
The patent filing is a little unusual in that its what is known as a Divisional patent, which means that it contains material from a previously filed patent, but focuses only on one aspect of the patent, which could be seen as a separate invention.
I wrote a few days ago about Google resuscitating a patent originating from their Historical Data patent, which caused a big stir in the mid 2000s, with my post, Revisiting Google’s Information Retrieval Based Upon Historical Data. Google filed a divisional patent application last week, Document Scoring based on Document Content Update based upon the Historical Data patent.
A new patent application from Google published at the USPTO today that has the same name as the divisional patent filed last week, as well as the same description, but the claims contained within the patent application are very different, and focus upon pages that may have transformed into doorway pages. The patent application is:
Document Scoring based on Document Content Update
Invented by Anurag Acharya, Matt Cutts, Jeffrey Dean, Paul Haahr, Monika Henzinger, Urs Hoelzie, Steve Lawrence, Karl Pfleger, Oclan Sercinoglu, and Simon Tong
Assigned to Google
US Patent Application 20110264671
Published October 27, 2011
Filed June 30, 2011
A system may determine a measure of how a content of a document changes over time, generate a score for the document based, at least in part, on the measure of how the content of the document changes over time, and rank the document with regard to at least one other document based, at least in part, on the score.
The new claims focus upon instances where the content of a page changes so much that it has quite possibly become a doorway page. The patent description tells us in one section what Google might be keeping an eye out for:
 According to an implementation consistent with the principles of the invention, information regarding document topics may be used to generate (or alter) a score associated with a document. For example, search engine 125 may perform topic extraction (e.g., through categorization, URL analysis, content analysis, clustering, summarization, a set of unique low frequency words, or some other type of topic extraction).
Search engine 125 may then monitor the topic(s) of a document over time and use this information for scoring purposes.
 A significant change over time in the set of topics associated with a document may indicate that the document has changed owners and previous document indicators, such as score, anchor text, etc., are no longer reliable.
Similarly, a spike in the number of topics could indicate spam. For example, if a particular document is associated with a set of one or more topics over what may be considered a “stable” period of time and then a (sudden) spike occurs in the number of topics associated with the document, this may be an indication that the document has been taken over as a “doorway” document.
Another indication may include the disappearance of the original topics associated with the document. If one or more of these situations are detected, then search engine 125 may reduce the relative score of such documents and/or the links, anchor text, or other data associated the document.
A change analysis might be performed on a page to see how topics previously associated with the page might have altered, by looking at changes for that page based upon the categories that would be associated with it, an analysis of the links pointing to and from the page, changes to the page’s content, whether different pages might now be associated with it if it were clustered together with similar pages, and how it might now be be summarized using a document summary approach (like is done for the creation of snippets in search results).
If a page might have ranked well for a specific topic in the past, and content about that topic has been removed, Google might take that as a sign that the page is now webspam.
If there’s a spike in the number of topics associated with the page, that could also be a sign that the page has now become a doorway page.
So, for example, a site with pages about fishing where its pages lose content on fishing related topics, and now include information about topics that aren’t very related such as weight loss, or travel might be perceived as having become doorway pages.
When the historical data patent came out, it read like a grab bag of loosely related ideas on how Google might identify stale pages, or pages that tranformed over time to become web spam, so it’s not a surprise that we’re starting to see the ideas and processes from that patent being split up and refiled as divisional patents.
I’ve lost a few bookmarks to sites that I’ve saved to changes made by new owners who have removed old content and replaced it with unrelated new content or filled it with advertisements. People do make a practice of buying older sites and changing them, sometimes with the motivation of making improvements, and sometimes inspired by being able to funnel the traffic those sites were receiving to other pages.
People also sometimes create fresh doorway pages, but it’s not unusual for doorway pages to be built upon older content found at sites that might have been abandoned, and then sold.
It’s also possible that some site owners might decide to change and update the content on their sites, to focus upon different products and topics, and to launch different services on their pages. It might not be a bad idea to do some of that on new pages with different URLs.