Google’s Reasonable Surfer: How the Value of a Link May Differ Based upon Link and Document Features and User Data

Not every link from a page in a link-based ranking system is equal, and a search engine might look at a wide range of factors to determine how much weight each link on a page may pass along.

A diagram showing different values for links passing amongst three different web pages.

One of the signals used by Google to rank web pages looks at the links to and from those pages, to see which pages are linked to by others. Links from “important” pages carry more weight than links from less important pages. An important page under this system is one that is linked to by other important pages, or by a large number of less important pages, or a combination of the two. This signal is known as PageRank, and it is only one of a large number of Google ranking signals used to rank web pages and determine how highly those pages show up in search results in response to a query from a searcher.

An early paper by the founders of Google, The Anatomy of a Large-Scale Hypertextual Web Search Engine, tells us:

PageRank can be thought of as a model of user behavior. We assume there is a “random surfer” who is given a web page at random and keeps clicking on links, never hitting “back” but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank.

Continue reading Google’s Reasonable Surfer: How the Value of a Link May Differ Based upon Link and Document Features and User Data

What Makes a Good Seed Site for Search Engine Web Crawls?

Would search engines be better if they started web crawls from sites like Twitter or Facebook? Wikipedia or Mahalo? DMOZ or the Yahoo Directory?

The Web refreshes at an incredible rate, with new pages added, old pages removed, and words pouring out from blogs, news sites, and other genres of pages. Ecommerce sites showcase new products and eliminate old ones. New sites launch and old domains expire.

Search engines attempt to keep their indexes of the Web as fresh as possible, and send out crawling programs to find the new, update changes, and explore disappearances. Failure to do so means outdated search engines that deliver people to deleted pages, overwritten content, and stale indexes that miss out on new sites.

When a search engine starts crawling the Web, it often begins by following URLs from chosen seed sites to explore other pages and other domains. But how does a search engine choose those seed sites?

Continue reading What Makes a Good Seed Site for Search Engine Web Crawls?

New Reason to Submit Businesses to Google Maps: Google Navigator and Personal Information Management Integration?

If you have a business where you want customers to visit in person, and you haven’t added and/or verified that business in Google Maps, you may want to consider doing so. You can do this regardless of whether you have a web site or not.

The Google Navigator system that Google has developed for mobile phones allows people to navigate to destinations in their cars, and even search for types of nearby businesses rather than specific businesses at specific addresses. So, if you want to find a nearby Thai restaurant, you can type in “Thai restaurant” and Google will either show you the nearest one it knows about, or provide a list of restaurants that you can choose from.

A new patent application from Google hints at even more features from such a navigation system that can associate information from your personal information management software into the Google navigation system, from programs such as contact lists, calendars, and task lists.

For instance, you set up a task list on your smart phone to visit a new client, and then pick up stamps and mail out letters, drop off drycleaning, and go grocery shopping. You’ve also added the new client’s address to your personal information system contact list and calendar.

Continue reading New Reason to Submit Businesses to Google Maps: Google Navigator and Personal Information Management Integration?

Yahoo Exploring Virtual Reality?

An interesting new patent filing from Yahoo raises a couple of interesting questions about the future of the company. It describes a wearable computing device that could be used in many ways and the patent application provides a number of examples that sound like something out of a science fiction novel I read a year or so ago.

Patent illustration of a pair of goggles that are a wearable computing device.

Something else that’s interesting is the apple on sidearm of the virtual goggles above, which the patent filing identifies as a visual power indicator. It looks surprisingly like something you would see on the back of an Apple laptop or on the main navigation bar at Apple.com. I don’t know if that has any significance at all, or if the creator of the image was having fun with the readers of the patent filing.

The pending patent application is:

Continue reading Yahoo Exploring Virtual Reality?

Yahoo Study Shows Search Responsible for 1 in 5 Pageviews Online

Would it surprise you if searches on the Web make up around 10 percent of all pageviews on the Web, and indirectly led to more than 21 percent of the pages viewed online? It surprised a couple of researchers from Yahoo.

That’s the result of a study conducted by Ravi Kumar and Andrew Tomkins from a sample of over 50 million user pageviews that they collected during 8 days in March, 2009. The information was captured through the Yahoo toolbar from people who agreed to the collection of data for this kind of analysis. Additional information was added by looking at the search logs from Yahoo.

While the data is limited to users of the Yahoo toolbar who agreed to the use of the data, and doesn’t include mobile searches or searches that used AJAX to display results, it does capture how people browse the Web and search at a number of search engines as well as searches at sites like eBay and Amazon.

The study is described in a paper titled A Characterization of Online Search Behavior (pdf), and is being presented tomorrow at the WWW2010 Conference in a session dedicated to User Models on the Web.

Continue reading Yahoo Study Shows Search Responsible for 1 in 5 Pageviews Online

How a Search Engine Might Crowdsource Web Spam Identification

The term crowdsourcing was coined by Wired correspondent Jeff Howe, in a 2006 article titled The Rise of Crowdsourcing, where he described how a crowd of people might use their spare time to help in solving problems or creating content, or in addressing other issues that a single person or organization might have difficulties addressing on their own. Could a search engine effectively rely upon searchers to help clean up web spam in search results?

A crowd of people milling about, waiting on Lincoln's second inauguration speech.

What if search engines added a “feedback” button to every page that they showed in search results where searchers could report pages in those results as web spam? Or, if they added a spam button to their toolbar that searchers could click upon to indentify pages they found through a search as spam?

Continue reading How a Search Engine Might Crowdsource Web Spam Identification

Is Your Site Faster than a Fortune 100 Company?

Google and Yahoo on Faster Web Pages

Earlier this month, Google announced that they would start considering the speed of a site as one of the ranking signals that they use to rank pages in search results.

Yahoo published a patent filing last year that also described how they might use page load and page rendering times as ranking signals as well. I wrote a post soon after it was published, Does Page Load Time influence SEO? exploring how Yahoo and other search engines might look at different factors regarding the speed of pages, including the experience of users on web pages.

Google’s Matt Cutts wrote about the recent Google announcement, and provided some more details, telling us that it’s likely that less than 1 percent of queries would be affected by this change.

Who Benefits?

Continue reading Is Your Site Faster than a Fortune 100 Company?

How a Search Engine May Identify Undesirable Web Pages By Analyzing Inlinks

The term “undesirable web pages” is used in a patent application from Yahoo published today to refer to pages that rank highly in search results based upon links pointed to those pages solely for the purpose of increasing their rankings for specific queries even though those pages may not be very relevant for the query terms in question.

“Undesirable” appears to indicate that these are pages that Yahoo doesn’t want ranking well in search results at their search engine.

So, what might Yahoo (and possibly other search engines) look at to determine whether a page is undesirable based upon the links it sees to that page?

Analyzing Inlinks for Manipulation

Continue reading How a Search Engine May Identify Undesirable Web Pages By Analyzing Inlinks

Learn SEO Directly from the Search Engines