The folks at Marketing Shift have issued a Search Engine Football Challenge, and started a fantasy football league (US. Football) at web 2.0 fantasy football site, FleaFlicker.com.
I just signed up, and added the Delaware Bay Picaroons to the league. Barry Schwartz (RustyBrick), Garrett French (Search Engine Lowdown) , and Thomas Shaffer (MSN) are some of the other folks playing. It looks like there’s a division for search engine marketers, and another one for Search Engine employees.
If you are interested in playing, contact Evan. Contact and other information is included here.
And, how to you grab a random page from that search engine?
A new Google employee, Ziv Bar-Yossef, gave a presentation at Google on August 17th answering those questions, which is available on a Google Techtalk video: Random Sampling from a Search Engine’s Index (video).
Ziv Bar-Yossef was most recently at Technion – Israel Institute of Technology, Israel, and as noted in the video, became a Google employee a couple of weeks ago. Before Technion, he was a researcher at the IBM Almaden Research Center.
The presentation is based upon a paper which won the 2006 International World-Wide Web Conference Best Paper Award: Random Sampling from a Search Engine’s Index
Being able to grab random pages from a search engine’s index can provide some interesting information about that search engine. The presentation compares things such as the number of dead pages in Google, MSN, and Yahoo, as well as the freshness of text on each, and what percentage of dynamic pages they have indexed.
Continue reading How Do You Estimate the Size of A Search Engine?
Assignments of query themes, favored and non-favored pages, ranking based upon editorial opinion – a new patent from Google provides an interesting way of ranking search results in response to queries. Here’s a quick summary of the processes described in this patent granted today to Google.
(1) A method that provides search results which includes:
(a) receiving a search query,
(b) retrieving one or more pages in response to the search query,
(c) determining whether the search query corresponds to at least one query theme of a group of query themes,
(d) ranking the one or more pages based on a result of the determination, and;
(e) serving those ranked pages.
(2) A method for determining an editorial opinion parameter for use in ranking search results:
(a) Developing one or more query themes,
(b) Identifying, for each query theme, a set favored pages,
(c) Identifying, for each query theme, a set of non-favored pages, and;
(d) determining an editorial opinion parameter for all of the pages in those sets.
Continue reading Google looks at Query Themes and Reranking Based upon Editorial Opinion
Somehow I missed this video tour of Yahoo’s headquarters when it came out on the Yahoo Corporate blog
The purple cow in the front lobby is a nice touch, and the trip inside the data center is intriguing, too.
Can looking at web traffic flowing through internet access points from Internet Service Providers help a search engine crawl the web more effectively?
A patent originally developed by the folks at Fast Search and Transfer, and assigned to Overture, was granted last week on the topic of improving the crawling of web pages by looking at that traffic, and it lays out the framework for doing so in fine detail. It also points out some of the limitations in not adopting such a practice while also explaining many of the benefits.
Some of these limitations include problems with:
- Starting to crawl the web from seed pages,
- The limited amount of access time crawlers have to servers,
- Difficulties crawlers have in retrieving dynamic objects, and
- Link topology as a source of relevance.
Continue reading How a Search Engine Might Use Information from an ISP While Capturing Traffic Flows
I had the good fortune to be able to meet Jim Hedger at the San Jose SES a little over a week ago. While we didn’t have the opportunity to talk at great length, it was nice to meet him. I’ve been reading his blog posts and articles for a few years now. I really enjoyed one of his latest.
On the Tuesday during the four day conference, I ran into Jill Whalen, who had just finished an interview with someone outside of the press room in the conference hall. It was good to be able to say hi, though I caught Jill going to another interview. Seems like she had a pretty full day of interviews. One of them was with Jim – Jill Whalen Interviewed at SES San Jose. Jill makes some pretty astute observations. Definitely worth a read.
Jill talks about the growth and maturation of the Search Marketing Industry, a larger focus on in-house SEO, more women in the search sector, the importance of educating clients, and the next High Rankings Seminar in Texas in October. I’ve been a guest at a couple of those seminars, and I’d highly recommend them to people interested in learning more about search engine marketing.
Nice interview, Jim and Jill.
A new patent application from Microsoft looks at content generated to spam search engines. Here’s the problem, as noted in the patent filing:
In the best case, search engine optimizers help web site designers generate content that is well-structured, topical, and rich in relevant keywords or query terms. Unfortunately, some search engine optimizers go well beyond producing relevant pages: they try to boost the ratings of a web site by loading pages with a wide variety of popular query terms, whether relevant or not. In fact, some SEOs go one step further: Instead of manually creating pages that include unrelated but popular query terms, they machine-generate many such pages, each of which contains some monetizable keywords (i.e., keywords that have a high advertising value, such as the name of a pharmaceutical, credit cards, mortgages, etc.). Many small endorsements from these machine-generated pages result in a sizable page rank for the target page. In a further escalation, SEOs have started to set up DNS servers that will resolve any host name within their domain, and typically map it to a single IP address.
Most if not all of the SEO-generated pages exist solely to mislead a search engine into directing traffic towards the “optimized” site; in other words, the SEO-generated pages are intended only for the search engine, and are completely useless to human visitors.
I recognized this quote, which is taken from an interesting research paper from Microsoft, Spam, Damn Spam, and Statistics: Using Statistical Analysis to Locate Spam Web Pages. If you are interested in how search engines are attempting to fight web spam, it’s a “must read” paper.
Continue reading Page Quality and Web Spam: Using Content Analysis to Detect Spam Pages
Ok, so adult content may not be your cup of tea, and you may not really care. Bear with me here. It’s not really so much filtering adult content that I’m interested in either, but instead how a search engine algorithm can use queries and user behavior to decide whether or not to filter something.
The following describes a patent application that may or may not be in use by Ask.com. I think that it’s important to also note that while the method here describes how the search engine could categorize and filter adult images, its use could be broadened to other content and categories. It provides a nice look at how query sessions and user activity can be used to help a search engine decide what pages and images are about, based upon seeing how people interact with the search engine.
You would think that an algorithm that attempts to filter adult images from the view of children and people who don’t want to see such images would have a visual component to it – that it would try to understand the pictures in question. The following patent application, invented by two Ask.com employees has no such visual aspect, but relies instead upon user behavior to gauge whether or not an image contains adult material.
There’s a decent possibility that adult content information may be returned in response to a query even if the search terms used had no obvious terms that there was an intention of requesting such information. This patent is aimed at gaining more control over what images might be returned during a search, and whether or not they are appropriate for the audience viewing those results.
Continue reading Ask.com Using Queries to Detect and Filter Adult Content?