Twitter Poll – How Does Google Index Content on the Web?

Google Indexes by Websites, Pages, or URLs

I thought this was an interesting question to ask people because I think it’s often misunderstood. Google treats content found at different URLs as if it is different content, even though it might be the same, such as in the following examples:

http://www.example.com
https://www.example.com
http://example.com
http://example.com/index.htm
http://example.com/Index.htm
http://example.com/default.asp

One of the most interesting papers I’ve come across on this topic is this one (One of the authors joined Google shortly after this was released – Ziv Bar-Yossef):

Continue reading “Twitter Poll – How Does Google Index Content on the Web?”

How Google May Respond to Reverse Engineering of Web Spam Detection

Web Spam Detection in Local Search at Google

The ultimate goal of any spam detection system is to penalize “spammy” content.

~ Reverse engineering circumvention of spam detection algorithms (Linked to below)

Four years ago, I wrote a post about a Google patent titled, The Google Rank-Modifying Spammers Patent. It told us that Google might be keeping an eye out for someone attempting to manipulate organic search results by spamming pages, and Google may delay responding to someone’s manipulative actions to make them think that whatever actions they were taking didn’t have an impact upon search results. That patent focused upon organic search results, and Google’s Head of Web Spam Matt Cutts responded to my post with a video in which he insisted that just because Google produced a patent on something doesn’t mean that they were going to use it. The video is titled, “What’s the latest SEO misconception that you would like to put to rest? ” Matt’s response is as follows:

I’m not sure how effective the process in that patent was, but there is a now a similar patent from Google that focuses upon rankings of local search SEO results. The patent describes this web spam detection problem in this way:

The business listing search results, or data identifying a business, its contact information, website address, and other associated content, may be displayed to a user such that the most relevant businesses may be easily identified. In an attempt to generate more customers, some businesses may employ methods to include multiple different listings to identify the same business. For example, a business may contribute a large number of listings for nonexistent business locations to a search engine, and each listing is provided with a contact telephone number that is associated with the actual business location. The customer may be defrauded by contacting or visiting an entity believed to be at a particular location only to learn that the business is actually operating from a completely different location. Such fraudulent marketing tactics are commonly referred to as “fake business spam”.

Continue reading “How Google May Respond to Reverse Engineering of Web Spam Detection”

The US is Asking for Help Understanding the Impacts of Artificial Intelligence

Artificial Intelligence, by Global Panorama. Some Rights Reserved
Artificial Intelligence, by Global Panorama. Some Rights Reserved

As we approach the celebration of the 4th of July, I thought it might be interesting to share a request for information made to the US Federal Register and a post on the Whitehouse blog. The US government is interested in what Artificial Intelligence might mean to the people of the United States, and how we could learn about it more. To find out, they are asking for comments by July 22, 2016.

Ed Felton, Deputy U.S. Chief Technology Officer wrote the following blog post about what the government would like to learn: How to Prepare for the Future of Artificial Intelligence. He tells us that the reason for the request for public input is to learn from a wide range of people about what we can do to become ready:

Continue reading “The US is Asking for Help Understanding the Impacts of Artificial Intelligence”