What Role Does Editorial Opinion Play in Ranking Search Results?
Assignments of query themes, favored and non-favored pages, ranking based upon editorial opinion – a new patent from Google provides an interesting way of ranking search results in response to queries. Here’s a quick summary of the processes described in this patent granted today to Google.
(1) A method that provides search results which includes:
(a) Receiving a search query,
(b) Retrieving one or more pages in response to the search query,
(c) Determining whether the search query corresponds to at least one query theme of a group of query themes,
(d) Ranking the one or more pages based on a result of the determination, and;
(e) Serving those ranked pages.
(2) A method for determining an editorial opinion parameter for use in ranking search results:
Continue reading “Google looks at Ranking Based upon Editorial Opinion”
Somehow I missed this video tour of Yahoo’s headquarters when it came out on the Yahoo Corporate blog
The purple cow in the front lobby is a nice touch, and the trip inside the data center is intriguing, too.
Can looking at web traffic flowing through internet access points from Internet Service Providers help a search engine crawl the web more effectively?
A patent originally developed by the folks at Fast Search and Transfer, and assigned to Overture, was granted last week on the topic of improving the crawling of web pages by looking at that traffic, and it lays out the framework for doing so in fine detail. It also points out some of the limitations in not adopting such a practice while also explaining many of the benefits.
Some of these limitations include problems with:
- Starting to crawl the web from seed pages,
- The limited amount of access time crawlers have to servers,
- Difficulties crawlers have in retrieving dynamic objects, and
- Link topology as a source of relevance.
Continue reading “How a Search Engine Might Use Information from an ISP While Capturing Traffic Flows”
I had the good fortune to be able to meet Jim Hedger at the San Jose SES a little over a week ago. While we didn’t have the opportunity to talk at great length, it was nice to meet him. I’ve been reading his blog posts and articles for a few years now. I really enjoyed one of his latest.
On the Tuesday during the four day conference, I ran into Jill Whalen, who had just finished an interview with someone outside of the press room in the conference hall. It was good to be able to say hi, though I caught Jill going to another interview. Seems like she had a pretty full day of interviews. One of them was with Jim – Jill Whalen Interviewed at SES San Jose. Jill makes some pretty astute observations. Definitely worth a read.
Jill talks about the growth and maturation of the Search Marketing Industry, a larger focus on in-house SEO, more women in the search sector, the importance of educating clients, and the next High Rankings Seminar in Texas in October. I’ve been a guest at a couple of those seminars, and I’d highly recommend them to people interested in learning more about search engine marketing.
Nice interview, Jim and Jill.
A new patent application from Microsoft looks at content generated to spam search engines. Here’s the problem, as noted in the patent filing:
In the best case, search engine optimizers help web site designers generate content that is well-structured, topical, and rich in relevant keywords or query terms. Unfortunately, some search engine optimizers go well beyond producing relevant pages: they try to boost the ratings of a web site by loading pages with a wide variety of popular query terms, whether relevant or not. In fact, some SEOs go one step further: Instead of manually creating pages that include unrelated but popular query terms, they machine-generate many such pages, each of which contains some monetizable keywords (i.e., keywords that have a high advertising value, such as the name of a pharmaceutical, credit cards, mortgages, etc.). Many small endorsements from these machine-generated pages result in a sizable page rank for the target page. In a further escalation, SEOs have started to set up DNS servers that will resolve any host name within their domain, and typically map it to a single IP address.
Most if not all of the SEO-generated pages exist solely to mislead a search engine into directing traffic towards the “optimized” site; in other words, the SEO-generated pages are intended only for the search engine, and are completely useless to human visitors.
I recognized this quote, which is taken from an interesting research paper from Microsoft, Spam, Damn Spam, and Statistics: Using Statistical Analysis to Locate Spam Web Pages. If you are interested in how search engines are attempting to fight web spam, it’s a “must read” paper.
Continue reading “Page Quality and Web Spam: Using Content Analysis to Detect Spam Pages”
Ok, so adult content may not be your cup of tea, and you may not really care. Bear with me here. It’s not really so much filtering adult content that I’m interested in either, but instead how a search engine algorithm can use queries and user behavior to decide whether or not to filter something.
The following describes a patent application that may or may not be in use by Ask.com. I think that it’s important to also note that while the method here describes how the search engine could categorize and filter adult images, its use could be broadened to other content and categories. It provides a nice look at how query sessions and user activity can be used to help a search engine decide what pages and images are about, based upon seeing how people interact with the search engine.
You would think that an algorithm that attempts to filter adult images from the view of children and people who don’t want to see such images would have a visual component to it – that it would try to understand the pictures in question. The following patent application, invented by two Ask.com employees has no such visual aspect, but relies instead upon user behavior to gauge whether or not an image contains adult material.
There’s a decent possibility that adult content information may be returned in response to a query even if the search terms used had no obvious terms that there was an intention of requesting such information. This patent is aimed at gaining more control over what images might be returned during a search, and whether or not they are appropriate for the audience viewing those results.
Continue reading “Ask.com Using Queries to Detect and Filter Adult Content?”