Information about where searchers hover their mouse pointers over different parts of search results, as well as advertisements and Google Onebox results, may be collected by the search engine to be used as ranking signals to determine in part how relevant those items may be seen by Google users in response to a search query.
When I view the contents of a web page, I often find myself moving my mouse pointer along the areas that I am viewing. There are a couple of reasons for this. One is that it makes it easier to focus upon the part of the page that I’m looking at. Another is that it’s easier to click upon a link that I find interesting if my pointer is near what I’m viewing.
According to Google, I may not be alone in this kind of behavior. Google may track mouse movements on its search results pages to help rank pages that show up in search results, to determine the quality of sponsored ads within those search results, and to decide whether or not showing onebox results such as maps or definitions or news or stock quotes is appropriate for some search queries.
When Google ranks web pages, it considers a wide range of ranking signals, such as how relevant a page might be to keywords used by a searcher, the quality and quantity of links pointing to that page, and user-behavior data collected about that page.
A number of patent filings and whitepapers from Google have told us that Google might collect a fair amount of user-behavior data about how we browse web pages such as; how long we might spend on pages, how far we might scroll down those pages, which pages we might click upon in search results, which pages we might not click upon, which links we might follow when we visit pages, if we print or bookmark or save pages, and more.
Continue reading Where you Point Your Mouse May Influence Google Search Rankings, Advertisement Placement, and Oneboxes
When you search on Bing, sometimes instead of seeing an ordered list of search results, you might see search results broken up into categories. For example, if you search for “Virginia,” your search results start off with an image and link to the state web site, as well as a map. You then see a couple of search results that look pretty relevant for the term.
What comes next is a little interesting. Instead of showing you just more links to web pages like you might see at Google or Yahoo, Bing starts showing you groupings of additional web pages organized by category. There’s a Virginia map category, then Virginia Tourism followed by Virginia Facts, then Virginia Jobs, and finally, Virginia History.
This diversification and grouping of search results is a departure from a paradigm commonly followed by many search engines. When a query term might have more than one meaning, or different categories of results might be equally useful to searchers, Bing may decide to present those search results in different categories, like it does on a search for Virginia. Here’s the first category shown in the Bing results on a search for Virginia:
Continue reading Bing’s Categorized Search Results
How does a search engine choose whether to show news items in web search results and when not to?
If you live in Bealton, Virginia, chances are that you may not be too interested in news of a car crash in Brooklyn, New York, when searching for information about Brooklyn. If you’re from Brooklyn, and want to find vacation information about the parks in Wisconsin, you may not be very concerned about the latest winning numbers in the Wisconsin lottery. Yet, someone searching for information about one of the states bordering the Gulf of Mexico these days might be likely to want to see news about the Oil spill in the region.
A Yahoo patent filing published recently describes how they might use a prediction system based upon the search engine’s query logs to decide whether or not to show news results. The prediction system uses a mixture of geographic information related to queries and to searchers as well as information about how “newsworthy” a location might be to make that determination. The patent tells us that it might create similar prediction models to determine whether or not to show other types of results as well. The patent application is:
System and Method of Geo-Based Prediction in Search Result Selection
Invented by Rosie Jones, Fernando Diaz, and Ahmed Hassan Awadallah
US Patent Application 20100161591
Published June 24, 2010
Filed December 22, 2008
Continue reading How Search Engines May Use Geography and Population Info in Deciding to Show News in Web Searches
Optical Character Recognition, or OCR, is a technology that can enable a computer to look at pictures that include text, and translate those visual representations of text into actual text. If you have words within images on your web pages, there’s a good chance that search engines are ignoring those words, when it comes to indexing your pages.
But that might change sometime in the future.
While OCR has been around for a while, search engines haven’t been using the technology when crawling and indexing the content of Web pages. Google’s webmaster guidelines tell us:
Try to use text instead of images to display important names, content, or links. The Google crawler doesn’t recognize text contained in images. If you must use images for textual content, consider using the “ALT” attribute to include a few words of descriptive text.
Yahoo’s page, How to Improve the Position of Your Website in Yahoo! Search Results provides the following tip:
Continue reading Teaching Computers to Read Newspapers: How a Search Engine Might Use OCR to Index Complex Printed Pages
Earlier this month I wrote about a granted Google patent, and a continuation of that patent filed earlier this year, that describe How Google Might Suggest Topics for You to Write About, by providing information to web publishers on queries and topics that are either under-represented in search results or where there’s more demand for information about those topics or queries than there are search results to meet that demand.
The topic struck home with a number of people, especially journalists, and I had a chance to have a conversion with Financial Times (FT.com) reporter Kenneth Li about Google’s patents. The Financial Times ran with two different stories on the topic (Google shadow over new media groups, and Google eyes Demand Media’s way with words), focusing primarily on how the technology involved in the patents could bring Google into competition with companies such as Demand Media, Associated Content, and AOL.
While searching through patent filings this morning, I came across an interesting newly published patent application from Demand Media. In the FT.com article on Demand Media, we’re told that:
Continue reading How Demand Media May Target Keywords for Profitability
When a search engine shows you results for a search, the pages shown are likely in order based upon a mix of relevance and importance.
But a search engine doesn’t usually stop there. It may look at other things to filter and reorder search results.
In 2006, I wrote 20 Ways Search Engines May Rerank Search Results, which described a number of ways that search engines may rerank pages. I followed that up in 2007 with 20 More Ways that Search Engines May Rerank Search Results.
I decided that it was time for a sequel or two in this series. I came up with another 25 reranking methods, but decided to stop at 10 in this post.
Many of the following are described in patents, and some of those patents were originally filed years ago – prehistoric times in Web years. The search engines may have incorporated ideas from those patents into what they are doing now, adopted those methods and since moved on to something new, or put them in a filing cabinet somewhere and forgot about them (I’d like the key to that filing cabinet).
Continue reading Another 10 Ways Search Engines May Rerank Search Results
This post may get you thinking about the benefits of using heading elements and lists on web pages for SEO purposes from a slightly different perspective than you may be used to.
Google uses a large number of signals to decide upon the order of pages shown in search results. Some of those signals measure the quality or importance of a web page, while others may indicate how relevant a page is for a particular search query entered into a search engine’s search box.
One fairly obvious relevancy signal is whether or not the words in a query actually appear upon a page that might be a search result for that query. If those words appear on the page more than once, the page might be considered even more relevant for that particular query than other web pages where the terms only appear once, or not at all.
Another factor that might indicate how relevant a page is for a particular set of terms is how close those terms might be on a page. While you could easily count the number of words between individual query terms to determine how close they are to each other, the formatting of web pages presents some challenges to the approach of simply counting words between terms, such as in a list like the following:
Continue reading Google Defines Semantic Closeness as a Ranking Signal
Not every link from a page in a link-based ranking system is equal, and a search engine might look at a wide range of factors to determine how might weight each link on a page may pass along.
One of the signals used by Google to rank web pages looks at the links to and from those pages, to see which pages are linked to by others. Links from “important” pages carry more weight than links from less important pages. An important page under this system is one that is linked to by other important pages, or by a large number of less important pages, or a combination of the two. This signal is known as PageRank, and it is only one of a large number of Google ranking signals used to rank web pages and determine how highly those pages show up in search results in response to a query from a searcher.
An early paper by the founders of Google, The Anatomy of a Large-Scale Hypertextual Web Search Engine, tells us:
PageRank can be thought of as a model of user behavior. We assume there is a “random surfer” who is given a web page at random and keeps clicking on links, never hitting “back” but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank.
Continue reading Google’s Reasonable Surfer: How the Value of a Link May Differ Based upon Link and Document Features and User Data