Trust and the Internet: Web Search Spam

Trust is a topic that has a profound affect upon the way search engines work on the web.

How easy or difficult is it to come up with methods that don’t rely (much) on human judgment to identify spam free pages that can be trusted, and to locate pages that are intended solely to rank well in search engines without providing any value at all for visitors, except possibly ads that are on the topic of their search?

In a week, there will be a gathering in Edinburgh, Scotland, during the 15th Annual World Wide Web Conference, on the subject of Models of Trust for the Web. While I won’t be attending, it sounds like an interesting presentation, and I wanted to take a look at some of the papers written by presenters at the conference. In this post, I’ll be looking at one of the papers to be presented, and listing some of the other work by its authors.

Problems with Yahoo’s Trustrank Assumptions

Continue reading Trust and the Internet: Web Search Spam

Share

On a Hypertext Roadtrip

Came across a lot of interesting stopping points on my travels around the web over the last few days, some fun stories, and some thoughtful musings…

Favorite title, and analogy, Please Stop With Your Chinese Math, reminded me of all the meetings I’ve been in where I’ve inadvertently rolled my eyes at some statistics, and hoped that no one noticed.

Book on the Science of Google Rankings – Probably has too much math for my tastes, but I’m going to have to get a copy after reading their Deeper Inside Pagerank to see where they pick up the storyline. I hope they don’t kill off any of the main characters.

LEGO’s Incredible Marketing Strategy (yes, legos and marketing are a great match)

Continue reading On a Hypertext Roadtrip

Share

Does Google use whois information?

Some recently published patent applications from Go Daddy explore whether additional whois information might help reduce spam and phishing, and improve search engine results. Google noted in a patent application last year that they might be looking at whois information while presenting and ranking pages.

I don’t know how easy it would be to set up the processes described by Go Daddy, or verify the reputation information that they describe, and maintain the records the system would depend upon.

The purpose of whois information

But it might be a moot point to even wonder. A recent decision by the folks at ICANN to limit the use of whois information makes it seem unlikely that that the scenerios envisioned by these documents will happen. ICANN’s Generic Names Supporting Organization held a vote in which they decided upon the sole purpose of whois information:

Continue reading Does Google use whois information?

Share

Google on Improving Adsense/Adwords

What are the best ways to pay someone for displaying ads on their sites? What are the easiest ways for people to find sites that they may want to have their ads placed upon? What information should be shared with advertisers about the sites that they might want to advertise upon, or have chosen to place ads on?

Adsense and Adwords are two sides of a content-based advertising system used by Google, and are amongst the methods that the search engine relies upon to make money. One of the main issues that faces Google is finding ways to make it easy to match up Adwords advertisers with the sites of people who display Adsense ads.

A new patent application from the Mountain View based search engine describes a method to help people looking to place ads with sites that are rich in content, have a lot of traffic, and are good prospective advertising hosts.

The patent filing is Determining prospective advertising hosts using data such as crawled documents and document access statistics (US Patent Application 20060095322), and lists Timothy Matthew Dierks as its inventor. It was originally filed on November 3, 2004, and published on May 4, 2006, and appears in the USTPO assignment database as being assigned to Google.

Continue reading Google on Improving Adsense/Adwords

Share

Microsoft Patents Dynamic Ranking Changes

Infrastructure

I spent too much time this past weekend paying attention to the NFL draft. Television coverage of the two day event really isn’t “must see TV,” but there were some surprises. One of them involved the fourth pick of the draft.

According to the New York Daily News, the Jets view left tackle D’Brickashaw Ferguson as the infrastructure for their offense, which Matt Leinart was supposed to be a part of. The Jets were working the phones trying to move back into the top 10 to get the USC quarterback after selecting Ferguson.

The Jets got their lineman, but missed out on the marquee name quarterback. It wasn’t an exciting choice, but probably a good move. We’ve been hearing for months about changes to the infrastructure of Google, which is almost equally exciting. You know the lineman is going to help the team a lot, but you really wished they picked that flashy quarterback or speedy running back.

There’s nothing quite like a good infrastructure on a search engine. It isn’t quite the same as an update, but it opens up a lot of possibilities.

Continue reading Microsoft Patents Dynamic Ranking Changes

Share

Advertising on Electronic Billboards via the web

Imagine being able to tap into an advertising network on the web that allowed you to upload your ads for display where large numbers of people will see them offline. A new patent granted today describes a method for doing that.

You’re driving to work, and pass a billboard that would be an idea place for an ad for your business. You notice that it’s not presently showing an ad, but has a web address displayed, along with a message to “advertise here.” You repeat the address over and over a few times to try to remember it.

You get to your office, fire up a browser, and visit the URL that you’ve been chanting for a couple of minutes now.

The page shows the rates for advertising on that billboard, some editorial guidelines, and a way to register and accept pay for showing an ad. After registering, and brainstorming for a few minutes on what you would like the billboard to say, you create an ad using powerpoint, submit it, enter in your credit card information, and set the time and duration to display it.

Continue reading Advertising on Electronic Billboards via the web

Share

Improving the Wikipedia results for Search Engine Optimization

I’ve been unhappy for a long time with what is on the pages of the Wikipedia for Search Engine Optimization. I decided this weekend to start making some changes to present the subject from a more rounded perspective.

Some of the things that bothered me about the article as it was:

1. It presented the industry as one largely drawn into two different camps, mostly at odds with one another – white hats and black hats – or those who follow ethical practices as defined by search engine guidelines, and those who don’t.

Ethics aren’t defined by search engines, but rather by moral codes of conduct, and having search engines set the tone of that conduct probably isn’t appropriate. They are businesses, beholden to shareholders, reliant on advertisers, and dependent upon searchers. They’ve never set themselves up to be the moral policemen for the search engine optimization community, and it’s a role that I suspect that they don’t relish.

2. Search engines have expanded their offerings considerably in the past few years to include much more than just organic results, and someone practicing SEO can be helped by having an understanding of RSS feeds, local search, mapping, vertical search, shopping search, news, and paid advertising.

Continue reading Improving the Wikipedia results for Search Engine Optimization

Share

Organizing social tags into hierarchies

Social tags like those used by Flickr or Delicious are interesting in that they allow people to categorize their own efforts (and those of others) and share material based upon those classifications.

But, the result of tagging can be a pretty flat list of many categories. There is a usefulness to a hierarchical ordering of information that enables people to browse and scroll down through categories. It can make it easier for people to find the information that they may be looking for.

A Ph.D. student from Stanford, Paul Heymann, has been working with Professor Hector Garcia-Molina to find a way to build Tag Hierarchies to make the efforts of tagging more useful. He notes that:

Tagging systems are excellent at the task that they were designed for—allowing a large, disparate group of users to collaboratively label massive, dynamic information systems like the web, media collections of millions of images, and so on. We are working to make these systems better by automating production of hierarchical taxonomies that describe the data from the raw flat tags generated by users.

Continue reading Organizing social tags into hierarchies

Share

Getting Information about Search, SEO, and the Semantic Web Directly from the Search Engines