Can looking at web traffic flowing through internet access points from Internet Service Providers help a search engine crawl the web more effectively?
A patent originally developed by the folks at Fast Search and Transfer, and assigned to Overture, was granted last week on the topic of improving the crawling of web pages by looking at that traffic, and it lays out the framework for doing so in fine detail. It also points out some of the limitations in not adopting such a practice while also explaining many of the benefits.
Some of these limitations include problems with:
- Starting to crawl the web from seed pages,
- The limited amount of access time crawlers have to servers,
- Difficulties crawlers have in retrieving dynamic objects, and
- Link topology as a source of relevance.
Continue reading How a Search Engine Might Use Information from an ISP While Capturing Traffic Flows
I had the good fortune to be able to meet Jim Hedger at the San Jose SES a little over a week ago. While we didn’t have the opportunity to talk at great length, it was nice to meet him. I’ve been reading his blog posts and articles for a few years now. I really enjoyed one of his latest.
On the Tuesday during the four day conference, I ran into Jill Whalen, who had just finished an interview with someone outside of the press room in the conference hall. It was good to be able to say hi, though I caught Jill going to another interview. Seems like she had a pretty full day of interviews. One of them was with Jim – Jill Whalen Interviewed at SES San Jose. Jill makes some pretty astute observations. Definitely worth a read.
Jill talks about the growth and maturation of the Search Marketing Industry, a larger focus on in-house SEO, more women in the search sector, the importance of educating clients, and the next High Rankings Seminar in Texas in October. I’ve been a guest at a couple of those seminars, and I’d highly recommend them to people interested in learning more about search engine marketing.
Nice interview, Jim and Jill.
A new patent application from Microsoft looks at content generated to spam search engines. Here’s the problem, as noted in the patent filing:
In the best case, search engine optimizers help web site designers generate content that is well-structured, topical, and rich in relevant keywords or query terms. Unfortunately, some search engine optimizers go well beyond producing relevant pages: they try to boost the ratings of a web site by loading pages with a wide variety of popular query terms, whether relevant or not. In fact, some SEOs go one step further: Instead of manually creating pages that include unrelated but popular query terms, they machine-generate many such pages, each of which contains some monetizable keywords (i.e., keywords that have a high advertising value, such as the name of a pharmaceutical, credit cards, mortgages, etc.). Many small endorsements from these machine-generated pages result in a sizable page rank for the target page. In a further escalation, SEOs have started to set up DNS servers that will resolve any host name within their domain, and typically map it to a single IP address.
Most if not all of the SEO-generated pages exist solely to mislead a search engine into directing traffic towards the “optimized” site; in other words, the SEO-generated pages are intended only for the search engine, and are completely useless to human visitors.
I recognized this quote, which is taken from an interesting research paper from Microsoft, Spam, Damn Spam, and Statistics: Using Statistical Analysis to Locate Spam Web Pages. If you are interested in how search engines are attempting to fight web spam, it’s a “must read” paper.
Continue reading Page Quality and Web Spam: Using Content Analysis to Detect Spam Pages
Ok, so adult content may not be your cup of tea, and you may not really care. Bear with me here. It’s not really so much filtering adult content that I’m interested in either, but instead how a search engine algorithm can use queries and user behavior to decide whether or not to filter something.
The following describes a patent application that may or may not be in use by Ask.com. I think that it’s important to also note that while the method here describes how the search engine could categorize and filter adult images, its use could be broadened to other content and categories. It provides a nice look at how query sessions and user activity can be used to help a search engine decide what pages and images are about, based upon seeing how people interact with the search engine.
You would think that an algorithm that attempts to filter adult images from the view of children and people who don’t want to see such images would have a visual component to it – that it would try to understand the pictures in question. The following patent application, invented by two Ask.com employees has no such visual aspect, but relies instead upon user behavior to gauge whether or not an image contains adult material.
There’s a decent possibility that adult content information may be returned in response to a query even if the search terms used had no obvious terms that there was an intention of requesting such information. This patent is aimed at gaining more control over what images might be returned during a search, and whether or not they are appropriate for the audience viewing those results.
Continue reading Ask.com Using Queries to Detect and Filter Adult Content?
Back in September, 2001, Google acquired the technological assets of Outride, which specialized in online information retrieval technologies. A white paper from the Outride group explains fairly well one of the approaches that they were taking in the field of personalized search (pdf).
We posit that at least two different computational techniques need to be combined to personalize search: contextualization and individualization. By contextualization, we mean the interrelated conditions that occur within an activity. Individualization means the totality of characteristics that distinguishes an individual. Contextualization includes factors like the nature of information available, the information currently being examined, the applications in use, when, and so on. Individualization encompasses elements like the user’s goals, prior and tacit knowledge, past information-seeking behaviors, among others. These elements are used to build a user model to personal relevancy computationally, as we will describe. It is this focus on the user and their context within the application of search that makes personalized search a compelling area to explore within the framework of contextual computing.
Google has developed some of their own personalization technology, but it’s possible that the research and methods developed by Outride have played a part in what they’ve done so far. I thought that this was an interesting set of comments in the paper from Outride:
It is worth mentioning upfront that since the following techniques alter the search experience, careful integration of these features into the user interface is required. In particular, the interface needs to provide a way to explain what the system is doing to personalize the experience as well as to undo the personalization.
Continue reading Google’s Personalized Ecommerce Inheritance
If you’ve tried to visit this site sometime within the last 12 hours, and were unable to connect, I apologize.
Just before 9:00 am, a construction accident severed fiber conduits running to the datacenter that my site (and one of my email servers) was hosted upon.
The severed fiber cable
My host set up an emergency update page, allowing us to keep track of the status of repairs to the fiber, and I’m appreciative that they did, while including pictures of the problem, and the people who were working on fixing it.
I’m sorry for any inconvenience this may have caused. A big thanks to the many who worked so hard to get the datacenter back online.
Continue reading Construction Accident Downs SEO by the Sea
Google has acquired a mobile company which appears to have a rich technical background in face and object recognition.
A hat tip to Ionut Alex. Chitu, who writes about the acquisition of Neven Vision by Google, and makes some very good points on why Neven Vision was a great choice when it comes to bringing mobile technology to Google.
Neven Vision appears to be the trade name of Nevenenginering, Inc., which has been assigned a number of patents by the United States Patent Office, and was the successor to another recognition software company, Eyematic Interfaces, Inc.
According to the Neven Vision web site, they have a number of offerings based upon the use of mobile technology:
Continue reading Google Acquires Neven Vision: Adding Object and Facial Recognition Mobile Technology
Back in July, I looked at a patent from Google that described issuing coupons from stores in Google’s Holy Grail of Shopping? It appears that Google has taken the first of what could be a number of steps towards making the processes described in that patent come true, with news that they are enabling advertisers to add coupons to their listings in Google Maps.
Danny Sullivan captures a lot of the details about this new service in Google Maps Gets Coupons
The patent, Generating and/or serving dynamic promotional offers such as coupons and advertisements, does a great job of laying out some possible next steps and detailing how these types of discount offerings could be expanded to make Google Local a great avenue for small businesses to attract customers. Check out my previous post on the patent for some of those possibilities.
Continue reading Google Coupons the Start of Something Bigger