There may be more than one URL for a single page on a website, which can cause problems when a search engine attempts to crawl and index pages on that site.
If the search engine can figure out some rules on how these different versions of URLs for a page come about, and identify only one version of a URL to index for the different versions, then it can save time and processing power by only crawling and indexing that one version.
The “canonical” version of a URL would be a standard single version, when there may be more than one way to represent the URL (or address) of a page.
Web crawlers can download only a finite number of documents or web pages in a given amount of time. Therefore, it would be advantageous if a web crawler could identify URL equivalence patterns in multiple different URLs that reference substantially identical pages and download only one document, as opposed to downloading all the substantially identical documents addressed by the multiple different URLs.
Continue reading “Microsoft Creating Rules for Canonical URLs”
I posted on Tuesday about the use of Topic Familiarity in Reranking Search Results. Stephen Pitts, of Build a Better Website asked if I thought the patent application being discussed in that post had anything to do with personalized search.
My response was that the inventors listed in the document stated within it that they weren’t looking at specific search terms, user queries, or user behavior when ranking pages to determine whether those were “introductory” or “advanced” pages. The question stuck with me though, and while looking at some other papers on the web, I noticed a Yahoo paper from the beginning of 2005 that shared an author with the patent, Omid Madani. The paper was an update of Yahoo’s personalization efforts for a January, 2005, conference titled Beyond Personalization 2005.
I decided that the paper might be worth writing about here, but since it was from early 2005, I thought I should look for some other documents from the company about personalization. A search showed up some job listings at Yahoo that I thought were interesting.
Yahoo Job Listings
Continue reading “A Peek at Personalization at Yahoo”
Unfamiliar with a topic, and want to find a simple page on a subject – one that didn’t require background reading or knowledge to understand the page?
More familiar with that subject, and you want to find an advanced page on the web?
Could a search engine help you find pages and rerank them based upon how familiar you may indicate that you are with the topic related to your query? It’s possible.
A search engine might pay attention to the following when indexing pages:
- Reading levels for the page,
- Word lengths of sentences and other features of text on the page,
- How simple or complex the stopwords* used upon a page may be.
Continue reading “How to Use Topic Familiarity to Rerank Search Results”
I’m happy to announce that some SEO by the Sea posts will be appearing in Spanish on OJObuscador.com in the future. OJObuscador is one of the top Spanish language blogs focusing upon search related topics, and I am looking forward to sharing some words and ideas with them in the future.
Tomy Lorsch contacted me earlier today and asked me if I would be interested in having posts from here show up on OJObuscador in Spanish. Excited by the prospect, I agreed. The announcement at OJObuscador – Muy pronto: Bill Slawski en OJObuscador
Thank you, Tomy.
A good percentage of all searches upon the major search engines involve people looking for something in a specific geographic location.
Understanding how search engines look for and extract geographic information on web sites, and handle that information about locations, and present it to searchers is one of the most important and possibly fastest growing areas in search. Especially with the growth of web access on phones and PDAs.
Mike Blumenthal has started a new blog, Understanding Google Local & Yahoo Local, to focus on issues around how search engines handle local search. I’m looking forward to his posts.
His latest asks Does Google Maps have a sandbox?
Continue reading “New Blog on Local Search”
Rand, over at 14th Colony asked about the ruling against Google by the Court of First Instance in Brussels (Belgium), and its translation into English. I found a copy of the ruling at ChillingEffects.org in an image pdf file. I’ve transcribed part of it which details the ruling of the Court in English.
Some interesting points, before the transcription:
I see the text “Defendant Defaulting” early on, which might lead one to believe that Google didn’t appear in Court for this hearing. I don’t know if that is true, but the “defaulting” language would lead me to believe that.
Who was the expert the Court depended upon?
Continue reading “Belgian Copyright Ruling Against Google News”