This is the third and final (for now) part in a series on Google Custom Search, and how information from custom search engines might be used in Google’s Web search.
In the first part of this series, SEO and Assumptions behind Web Searches, I described some assumptions search engineers often make that are challenged by a recently published Google patent application, Aggregating Context Data for Programmable Search Engines.
Quickly, those questioned assumptions are:
- Search Engines should avoid using information from external sources in learning how people search
- User data collected about a searcher’s past searches and browsing behavior can help identify the intent of that searcher during new searches
- User data collected about specific searchers, queries, and web sites can also be aggregated to help understand the intent behind a search
Continue reading The Expertise of Google Custom Search Engines vs. the Wisdom of Crowds
This is the second part of a series on Google Custom Search Engines.
Why spend so much time looking at Google Custom Search? Here are a few reasons which I’ve written about in previous posts:
- Google Subscribed Links, which can be created in Google Custom Search, sometimes appear in Google’s Web Search even if you don’t subscribe to those links.
- Google’s patent describing their Trust Rank approach explores how the kind of labels used as annotations by trusted sources (such as some Custom Search Engine builders) might influence web search results.
- Another patent application from Google explains how labels, which can be created in Google Custom Search, might affect the classification of Web pages by Google, and help to define query refinements that appear above Web search results, as does an additional granted Google Patent describing how Google might be Filtering search results using annotations.
Continue reading Is Google Custom Search Influencing Google Web Search?
This is the first in a series of posts on Google Custom Search Engines.
If you’re interested in how search works on the Web, you may want to spend some time exploring Google Custom Search. It enables you to create a site search for an individual site, or a customized search engine on specific topics that may focus upon a number of sites that you can select.
There’s another reason to start looking at Google Custom Search Engines, or CSEs. A recently published patent application from Google describes how the Search Engine may use information from CSEs to influence what we might see in Google’s Web search. This post is an introduction to the topic, and it covers how search engines attempt to identify the intent behind queries and web pages.
The patent application, Aggregating Context Data for Programmable Search Engines, includes a fairly well written statement (for a patent application) about one of the difficulties that search engines face when trying to come up with results to show searchers in response to queries. I thought it was worth sharing here, and it provides a nice introduction to a longer exploration of how Google CSEs might be used to improve web search.
Continue reading Assumptions behind Web Searches
Last week, I wrote about a patent granted to Google which described how the search engine may use categories as a search ranking factor to decide whether or not to include some pages in search results for specific queries. The patent was originally filed back in 2004, and focused primarily upon classifying documents based upon things such as the contents of web pages and anchor text in links pointing to pages.
A few days ago, a new patent application was published by Google which focuses upon classification of documents based upon a wider range of information, including user behavior data. Instead of a simple matching of weighted classifications between web pages and queries, the patent filing describes a way of creating profiles for pages which include classification information, and spreading that classification information to unclassified pages through query profiles for queries which both types of pages rank for in search results.
This kind of user-data based profile information could be used along with more conventional ways of ranking pages to improve the quality of search results, and to provide more personalized results to searchers. The patent application is:
Continue reading Improved Web Page Classification from Google for Rankings and Personalized Search
Does Google determine categories for pages and for queries, and can those play a role in how it ranks pages in search results?
Almost everyday, I receive visitors on a query for “bookshelf plans,” on the strength of a past post about Google’s plans for virtual bookshelves in Google library. Most of those visitors probably aren’t surprised that the page is about an online library given the title and snippet appearing for the post, but most of the search results preceeding it describe wooden rather than virtual shelves. My page really doesn’t fit within the same category as the others.
When a search engine determines whether a page is relevant for a certain query, it does more than try to match the text of the query with a page that contains that text, and looking at the links pointing to the page. A Google patent filed in 2004, and granted today describes how the search engine may try to associate web pages with categories, and queries with categories, and come up with a category score for each, to use to rank those pages for categories.
We are told that this kind of category matching addresses a couple of different problems.
Continue reading How Google May Use Categories as a Search Ranking Factor
In my RSS feed reader, I have a section that I labeled “Vanity.” The feeds that occupy it are things like web search and twitter search feeds for my name, my sites’ names, my business name, and some other searches that interest me on the Web. I don’t really consider tracking these things to be a matter of vanity, but instead of necessity – a way to find conversations that might involve me, my site, and my business, and a chance to possibly get involved in those discussions.
As a site owner, I’ve also developed a habit that many site owners likely also share, of performing searches for queries such as my name, my sites’ names, my business name, and some other queries that I’m interested in. The exercise isn’t one based upon obsession with ranking as much as it is about being concerned about those conversations that I mentioned above, and concerned about how the search engines might be portraying my sites. For instance, when I search for my site name (seo by the sea), and Google shows a snippet that starts off with the date “Mar 8, 2005,” I find myself concerned about what that might mean to people who see that date.
Continue reading Bad Dates in Google Snippets: Hey Google, I’ve Blogged a Little Since 2005!
If you search for the word “cold” and you’re using the search box for a health related site, chances are you want to find out something about the illness. If you search for “cold” at Google or Yahoo or Bing, there’s a chance that you might be interested in weather or airconditioning or a cold war or stuffy nose.
Different sites and pages might focus upon specific topics of interest, such as health or sports, or weather, or constuction. A way a search engine might use to try to get around some of the limitations of words with multiple meanings is to assign domain or topical scores to web pages and other items found on the Web, regardless of which queries they might be good results for. Then if a query seems to cover a specific domain or topic, to return pages that involve that topic, based upon a “domain score” for those pages.
Why Look at Domains (Categories of Interest) in Ranking Pages?
The patent’s description begins by describing conventional methods of ranking pages in search results. When a search engine attempts to match a query with a document, there are a number of steps that it may go through first.
Continue reading How a Search Engine Might Rerank Search Results Based upon Topics
I thought it might be fun to put together an SEO Quiz.
How many of the following can you get right?
I’ll post the answers later. The answers are now listed, after a spoiler warning below.
1. Stanford University’s PageRank is named after?
a. Ranking Web Pages
b. Satchel Page
c. Larry Page
d. The Palo Alto Gradient Evaluation
e. None of the above
2. Which of the following search engine crawling models has not been proposed in either an academic paper or patent for emulating how people might visit web pages?
a. Random Surfer
b. Rowdy Surfer
c. Cautious Surfer
d. Reasonable Surfer
e. None of the above
3. Which company wasn’t started by two students who walked away from finishing their degrees.
Continue reading SEO Quiz