This is the first in a series of posts on Google Custom Search Engines.
If you’re interested in how search works on the Web, you may want to spend some time exploring Google Custom Search. It enables you to create a site search for an individual site, or a customized search engine on specific topics that may focus upon a number of sites that you can select.
There’s another reason to start looking at Google Custom Search Engines, or CSEs. A recently published patent application from Google describes how the Search Engine may use information from CSEs to influence what we might see in Google’s Web search. This post is an introduction to the topic, and it covers how search engines attempt to identify the intent behind queries and web pages.
The patent application, Aggregating Context Data for Programmable Search Engines, includes a fairly well written statement (for a patent application) about one of the difficulties that search engines face when trying to come up with results to show searchers in response to queries. I thought it was worth sharing here, and it provides a nice introduction to a longer exploration of how Google CSEs might be used to improve web search.
Continue reading Assumptions behind Web Searches
Last week, I wrote about a patent granted to Google which described how the search engine may use categories as a search ranking factor to decide whether or not to include some pages in search results for specific queries. The patent was originally filed back in 2004, and focused primarily upon classifying documents based upon things such as the contents of web pages and anchor text in links pointing to pages.
A few days ago, a new patent application was published by Google which focuses upon classification of documents based upon a wider range of information, including user behavior data. Instead of a simple matching of weighted classifications between web pages and queries, the patent filing describes a way of creating profiles for pages which include classification information, and spreading that classification information to unclassified pages through query profiles for queries which both types of pages rank for in search results.
This kind of user-data based profile information could be used along with more conventional ways of ranking pages to improve the quality of search results, and to provide more personalized results to searchers. The patent application is:
Continue reading Improved Web Page Classification from Google for Rankings and Personalized Search
Does Google determine categories for pages and for queries, and can those play a role in how it ranks pages in search results?
Almost everyday, I receive visitors on a query for “bookshelf plans,” on the strength of a past post about Google’s plans for virtual bookshelves in Google library. Most of those visitors probably aren’t surprised that the page is about an online library given the title and snippet appearing for the post, but most of the search results preceeding it describe wooden rather than virtual shelves. My page really doesn’t fit within the same category as the others.
When a search engine determines whether a page is relevant for a certain query, it does more than try to match the text of the query with a page that contains that text, and looking at the links pointing to the page. A Google patent filed in 2004, and granted today describes how the search engine may try to associate web pages with categories, and queries with categories, and come up with a category score for each, to use to rank those pages for categories.
We are told that this kind of category matching addresses a couple of different problems.
Continue reading How Google May Use Categories as a Search Ranking Factor
In my RSS feed reader, I have a section that I labeled “Vanity.” The feeds that occupy it are things like web search and twitter search feeds for my name, my sites’ names, my business name, and some other searches that interest me on the Web. I don’t really consider tracking these things to be a matter of vanity, but instead of necessity – a way to find conversations that might involve me, my site, and my business, and a chance to possibly get involved in those discussions.
As a site owner, I’ve also developed a habit that many site owners likely also share, of performing searches for queries such as my name, my sites’ names, my business name, and some other queries that I’m interested in. The exercise isn’t one based upon obsession with ranking as much as it is about being concerned about those conversations that I mentioned above, and concerned about how the search engines might be portraying my sites. For instance, when I search for my site name (seo by the sea), and Google shows a snippet that starts off with the date “Mar 8, 2005,” I find myself concerned about what that might mean to people who see that date.
Continue reading Bad Dates in Google Snippets: Hey Google, I’ve Blogged a Little Since 2005!
If you search for the word “cold” and you’re using the search box for a health related site, chances are you want to find out something about the illness. If you search for “cold” at Google or Yahoo or Bing, there’s a chance that you might be interested in weather or airconditioning or a cold war or stuffy nose.
Different sites and pages might focus upon specific topics of interest, such as health or sports, or weather, or constuction. A way a search engine might use to try to get around some of the limitations of words with multiple meanings is to assign domain or topical scores to web pages and other items found on the Web, regardless of which queries they might be good results for. Then if a query seems to cover a specific domain or topic, to return pages that involve that topic, based upon a “domain score” for those pages.
Why Look at Domains (Categories of Interest) in Ranking Pages?
The patent’s description begins by describing conventional methods of ranking pages in search results. When a search engine attempts to match a query with a document, there are a number of steps that it may go through first.
Continue reading How a Search Engine Might Rerank Search Results Based upon Topics
I thought it might be fun to put together an SEO Quiz.
How many of the following can you get right?
I’ll post the answers later. The answers are now listed, after a spoiler warning below.
1. Stanford University’s PageRank is named after?
a. Ranking Web Pages
b. Satchel Page
c. Larry Page
d. The Palo Alto Gradient Evaluation
e. None of the above
2. Which of the following search engine crawling models has not been proposed in either an academic paper or patent for emulating how people might visit web pages?
a. Random Surfer
b. Rowdy Surfer
c. Cautious Surfer
d. Reasonable Surfer
e. None of the above
3. Which company wasn’t started by two students who walked away from finishing their degrees.
Continue reading SEO Quiz
Might Google rank links to pages differently based a perception of how related or affiliated those pages might be to each other? For instance, if three pages authored by the same person link to a fourth page, and two other pages, each written by other people, also link to that fourth page, should the three links from the same author count as passing along three times as much link weight as the links from the independently written pages?
A patent granted to Google today shows how the search engine might analyze how “affiliated” pages or sites are to each other, and how their degree of affiliation might influence the amount of weight passed along by each link.
Continue reading Google’s Affiliated Page Link Patent
Your website may be invaded by robots at any time. If you’re lucky that is – at least if you want people to visit you from places like Google or Yahoo or Bing. And, if the visiting robots are polite.
In the early days of the Web, automated programs known as robots, or bots, were created to find information on the Web, and to create indexes of that information. They would do this regardless of whether you wanted them to visit your pages or not, and you had no way to tell them not to go through your web site.
If you search through Usenet message boards from the early days of the Web, you might come across a document such as the World Wide Web Frequently Asked Questions (FAQ), Part 1/2 (December, 1994), which describes robots in those days:
4.10: Hey, I know, I’ll write a WWW-exploring robot! Why not?
Continue reading Google Patent Granted on Polite Web Crawling