A new patent application from Microsoft looks at content generated to spam search engines. Here’s the problem, as noted in the patent filing:
In the best case, search engine optimizers help web site designers generate content that is well-structured, topical, and rich in relevant keywords or query terms. Unfortunately, some search engine optimizers go well beyond producing relevant pages: they try to boost the ratings of a web site by loading pages with a wide variety of popular query terms, whether relevant or not. In fact, some SEOs go one step further: Instead of manually creating pages that include unrelated but popular query terms, they machine-generate many such pages, each of which contains some monetizable keywords (i.e., keywords that have a high advertising value, such as the name of a pharmaceutical, credit cards, mortgages, etc.). Many small endorsements from these machine-generated pages result in a sizable page rank for the target page. In a further escalation, SEOs have started to set up DNS servers that will resolve any host name within their domain, and typically map it to a single IP address.
Most if not all of the SEO-generated pages exist solely to mislead a search engine into directing traffic towards the “optimized” site; in other words, the SEO-generated pages are intended only for the search engine, and are completely useless to human visitors.
I recognized this quote, which is taken from an interesting research paper from Microsoft, Spam, Damn Spam, and Statistics: Using Statistical Analysis to Locate Spam Web Pages. If you are interested in how search engines are attempting to fight web spam, it’s a “must read” paper.
Continue reading Page Quality and Web Spam: Using Content Analysis to Detect Spam Pages
Ok, so adult content may not be your cup of tea, and you may not really care. Bear with me here. It’s not really so much filtering adult content that I’m interested in either, but instead how a search engine algorithm can use queries and user behavior to decide whether or not to filter something.
The following describes a patent application that may or may not be in use by Ask.com. I think that it’s important to also note that while the method here describes how the search engine could categorize and filter adult images, its use could be broadened to other content and categories. It provides a nice look at how query sessions and user activity can be used to help a search engine decide what pages and images are about, based upon seeing how people interact with the search engine.
You would think that an algorithm that attempts to filter adult images from the view of children and people who don’t want to see such images would have a visual component to it – that it would try to understand the pictures in question. The following patent application, invented by two Ask.com employees has no such visual aspect, but relies instead upon user behavior to gauge whether or not an image contains adult material.
There’s a decent possibility that adult content information may be returned in response to a query even if the search terms used had no obvious terms that there was an intention of requesting such information. This patent is aimed at gaining more control over what images might be returned during a search, and whether or not they are appropriate for the audience viewing those results.
Continue reading Ask.com Using Queries to Detect and Filter Adult Content?
Back in September, 2001, Google acquired the technological assets of Outride, which specialized in online information retrieval technologies. A white paper from the Outride group explains fairly well one of the approaches that they were taking in the field of personalized search (pdf).
We posit that at least two different computational techniques need to be combined to personalize search: contextualization and individualization. By contextualization, we mean the interrelated conditions that occur within an activity. Individualization means the totality of characteristics that distinguishes an individual. Contextualization includes factors like the nature of information available, the information currently being examined, the applications in use, when, and so on. Individualization encompasses elements like the user’s goals, prior and tacit knowledge, past information-seeking behaviors, among others. These elements are used to build a user model to personal relevancy computationally, as we will describe. It is this focus on the user and their context within the application of search that makes personalized search a compelling area to explore within the framework of contextual computing.
Google has developed some of their own personalization technology, but it’s possible that the research and methods developed by Outride have played a part in what they’ve done so far. I thought that this was an interesting set of comments in the paper from Outride:
It is worth mentioning upfront that since the following techniques alter the search experience, careful integration of these features into the user interface is required. In particular, the interface needs to provide a way to explain what the system is doing to personalize the experience as well as to undo the personalization.
Continue reading Google’s Personalized Ecommerce Inheritance
If you’ve tried to visit this site sometime within the last 12 hours, and were unable to connect, I apologize.
Just before 9:00 am, a construction accident severed fiber conduits running to the datacenter that my site (and one of my email servers) was hosted upon.
The severed fiber cable
My host set up an emergency update page, allowing us to keep track of the status of repairs to the fiber, and I’m appreciative that they did, while including pictures of the problem, and the people who were working on fixing it.
I’m sorry for any inconvenience this may have caused. A big thanks to the many who worked so hard to get the datacenter back online.
Continue reading Construction Accident Downs SEO by the Sea
Google has acquired a mobile company which appears to have a rich technical background in face and object recognition.
A hat tip to Ionut Alex. Chitu, who writes about the acquisition of Neven Vision by Google, and makes some very good points on why Neven Vision was a great choice when it comes to bringing mobile technology to Google.
Neven Vision appears to be the trade name of Nevenenginering, Inc., which has been assigned a number of patents by the United States Patent Office, and was the successor to another recognition software company, Eyematic Interfaces, Inc.
According to the Neven Vision web site, they have a number of offerings based upon the use of mobile technology:
Continue reading Google Acquires Neven Vision: Adding Object and Facial Recognition Mobile Technology
Back in July, I looked at a patent from Google that described issuing coupons from stores in Google’s Holy Grail of Shopping? It appears that Google has taken the first of what could be a number of steps towards making the processes described in that patent come true, with news that they are enabling advertisers to add coupons to their listings in Google Maps.
Danny Sullivan captures a lot of the details about this new service in Google Maps Gets Coupons
The patent, Generating and/or serving dynamic promotional offers such as coupons and advertisements, does a great job of laying out some possible next steps and detailing how these types of discount offerings could be expanded to make Google Local a great avenue for small businesses to attract customers. Check out my previous post on the patent for some of those possibilities.
Continue reading Google Coupons the Start of Something Bigger
Now that I’ve had a chance to catch a little shut-eye after a restless late night flight from the San Francisco Airport to Baltimore Friday night, I’ve been able to sort through the pictures I took on the trip. I bought a new camera the day I began my journey, and I’m still getting used to it, so sadly many of my pictures were a litle too blurry to share. But some of them turned out ok, and a sampling of those appear below.
I had a great time in California, many chances to share some time with old friends and to make new ones, and the opportunity to exchange some ideas with a lot of sharp folks. Thanks to everyone who made this trip and conference such an enjoyable visit to the west coast. Here are some images from my visit:
Before traveling to the Search Engine Strategies Conference, I had the chance to spend a few days touring around San Francisco, including a trip to some wineries in Sonoma Valley. Here’s a picture of me and one of my hosts, Barry Swain, relaxing in front of a wine tasting room.
Continue reading Some Pictures from Sonoma, San Francisco, and San Jose
At the Eleventh International World Wide Web Conference, a poster from John Tomlin, Andrew Tomkins, Jasmine Novak, and Arvind Arasu was presented titled PageRank Computation and the Structure of the Web: Experiments and Algorithms (pdf). The first three authors wrote the paper as IBM employees, and co-author Arvind Arasu is listed on the document as a member of the Computer Science Department at Stanford University.
Three of those four authors are listed as the inventors of a newly granted patent which describes a way to rapidly compute pagerank, which was filed with the US Patent Office around the same time as the presentation of the paper. John Tomlin and Andrew Tomkins are now at Yahoo, and Arvind Arasu is a researcher at Microsoft.
System and method for rapid computation of PageRank
Invented by John Anthony Tomlin, Andrew S. Tomkins, and Arvind Arasu
Assigned to IBM
US Patent 7,089,252
Granted August 8, 2006
Filed April 25, 2002
Continue reading IBM Granted Patent for Pagerank