How Google May Identify When Sites Transform into Doorway Pages

You go to a site that you’ve enjoyed and bookmarked sometime in the past but haven’t visited in a while, and it’s changed. The topics it discusses are different, or the writing style isn’t quite the same, or it suddenly has links within its content to commercial pages that it probably wouldn’t have linked to before, or all of those things. It also seems heavily focused upon more commercial terms and content. It’s changed, and now its pages now have the appearance of what many might call “doorway pages.”

Doorway pages have also been referred to by terms like gateway pages, entry pages, bridge pagers, portal pages, and their primary purpose is to attract visitors from search engines in order to send them to other places.

As a site owner, you don’t want Google to start identifying your pages as doorway pages. Google’s Webmaster Guidelines tell us to:

Continue reading

Share

Google’s Exact Match Domain Name Patent (Detecting Commercial Queries)

One question I’m sometimes asked by people is about whether or not they should choose a domain name that includes the name of their business or brand, or if they should use keywords within a domain name to make it easier for them to rank for those keywords in Google and the other search engines. I often explain that while it may help them ranking for the phrase chosen if they use a keyword domain (often referred to as an exact match domain, or emd), that I usually prefer domain names using a brand, and that the best domain names tend to be somewhat short, memorable, and easy to spell, with emphasis on the “memorable.”

I have seen a lot of discussion on the Web about keywords in domain names, and a number of people discussing their experiments with exact match domains, and how those may help a site to rank for terms used in the domain name. The following video was uploaded at the Google Webmaster Help Channel this past March, with the Head of Google’s Web Spam team, Matt Cutts answering the question, “How would you explain ‘The Power of Keyword Domains’ to someone looking to take a decision what kind of domain to go for?”

Continue reading

Share

Authority vs. Popularity in Search Engine Rankings

When search engines return web pages in search results in response to a query, most people assume that the pages being show are the ones that a search engine has decided are the “best” pages in response to their search terms. But what does the word “best” mean in that context? The search engines attempt to show pages that are both relevant to the query (and the intent of a searcher), and are popular.

Google’s PageRank algorithm is a popularity algorithm based upon a citation analysis approach to finding pages, or as Google Founder Larry Page noted in Improved Text Searching in Hypertext Systems (pdf):

The intuition is that if your query matches tens of thousands of documents, you would be happier looking at documents that many people thought to mention in their web pages, or that people who had important pages mentioned at least a few times.

Continue reading

Share

Revisiting Google’s Information Retrieval Based Upon Historical Data

Can patents be said to have family histories? If so, this post is going to introduce a barely known ancestor to one of the most written about search related patents on the Web, as well as a brand new grandchild to the patent.

The patent is Google’s Information retrieval based on historical data, which was filed in 2003, and granted in 2008. When it was published as a pending patent application in 2005, it created a pretty big stir amongst the forums and blogs of the search community.

The patent has two focuses which both take advantage of recording changes to a site over time. One is to help identify web spam, and the other is to help avoid stale documents being returned in response to a query. It raised questions between SEOs such as how important are the ages of domains and of links, as well as:

  • Does Google favor fresher sites over older sites, or older sites over fresher sites?
  • Even more, how does Google weigh the age of a website?
  • Are the search engines looking at whois data to see who owns websites, and if there has been a change of ownership?
  • If the content of a site changes, and the anchor text pointing to it remains the same even though it’s no longer relevant, will it still rank for the terms in the anchor text?
  • If you buy a website and make changes to it, will the PageRank for that site start to evaporate or expire?

Continue reading

Share

How a Search Engine May Automate Web Spam Reports and Search Feedback

How much does feedback from searchers impact the search results that we see at Bing or Google? How do those search engines process and respond to that feedback?

The links that Google and Bing present for searchers to provide feedback on search results are listed at the bottoms of the search results pages for each. If there was a link instead after each search result where someone could provide feedback, how much of an impact would that change have, and would the search engines be able to handle the feedback that they receive?

A patent granted to Microsoft this week describes how the search engine may automate processes for “dissatisfaction reports” that are manually submitted by searchers, and how the search engine may file its own disatisfaction reports in some instances. While some of the feedback that search engines receive may include web spam reports, they may also receive feedback that something is “broken” with the search engines, or that a URL that should be showing for a specific query isn’t, or that the results just weren’t helpful.

Providing Feedback at Bing and Google

Continue reading

Share

How Google Might Filter Out Duplicate Pages from Bounce Pad Sites

I hadn’t heard the term “Bounce Pad” being referred to websites before, but it’s useful knowing the language of search engines, and the things they might look for when crawling and indexing webpages, and serving results to searchers. Determining whether a site is a bounce pad involves an analysis about redirects appearing on the site, like in the image below from a Google patent granted this week:

Screen shot from the bounce pad patent showing calculation of redirect score and spam score to determine whether a site is a bounce pad.

One of the mysteries associated with Google’s search results is how it determines which pages to show when there are duplicate or substantially duplicated documents within its index. A search engine doesn’t want to show searchers a list of search results that contains substantially the same pages, so when it finds pages that are pretty close to being the same, it will create a “cluster” of those pages and choose a representative page to display.

That kind of duplication can happen for a number of reasons, such as someone copying content from another page (with or without permission or license to do so), the majority of the content on a page being a manufactor’s or publisher’s description, a content management system set up so that the same page gets published more than once at different URLs, content being republished on a mirror site or sites set up so that if there’s too much traffic to one of the sites that the others may handle overflow, and more.

Continue reading

Share

Wow! Google Acquires Wowd Search Patents

Earlier this year, Google acquired the patents of a real time search engine started in 2009, Wowd (a play on the word “crowd.”) Wowd had no web crawlers, but rather relied upon users downloading a browser application, so that every page they visited was nominated to be included in search results. A Press Release from February, 2010 tells us about the search engine:

Wowd is a real-time search engine for discovering what’s popular on the Web right now. Unlike other engines in the space, Wowd focuses on discovery and exploration of the entire Web, i.e. surfacing trends, breaking news, social media topics, and popular pages. Wowd then taps into the “attention frontier” of its user community to build real-time search results. Wowd makes it easy to discover the latest trends, topics, and hottest Web pages.

In August of last year, Wowd released a search tool for Facebook, to add a number of features to the Facebook experience, including custom feeds, game spam blocking, and social search. A look at the Wowd website however tells us that “the team has decided to pursue new opportunities,” with some members of the engineering team joining Facebook. There’s no date on the message.

Continue reading

Share

Do Search Engines Use Social Media to Discover New Topics?

A new patent filing from Yahoo raises the question, “How much has social media influenced the expectations of searchers, and forced search engines to change?”

Before I can begin to even think about that, I have to ask if looking at Yahoo patents even a good idea after their 2009 deal with Microsoft to have Bing power their search results.

The Yahoo patent application was filed after the agreement between Yahoo and Microsoft, and was published last week. Are Yahoo patents are still worth spending time with? After reading through the Yahoo patent application about how the search engine might use information from social media platforms to discover recently hot topics and webpages that are relevant to those topics, I would say that they are. The terms of the agreement between Yahoo and Bing includes a 10 year exclusive right for Microsoft to use search technologies developed by Yahoo, and doesn’t stop Yahoo from applying those technologies itself.

The patent filing explores “recency-sensitive” queries, where searchers are looking for resources that are both topically relevant as well as fresh, such as novel information about an earthquake. If you’ve been watching twitter streams, Facebook updates, and other social media, you’ve seen that sometimes these sources are the best and fastest places on the Web to find that kind of information.

It’s possible that a search engine that ignores sources like those isn’t going to be able to return any relevant results for those types of queries – what the patent’s inventors call a “zero recall” problem.

Continue reading

Share

Getting Information about Search, SEO, and the Semantic Web Directly from the Search Engines