Search engines use programs to crawl the web, and identify new pages and newly updated pages to include in their index. These are often referred to as robots, or crawlers, or spiders. But there are other ways that the search engine gets information about pages that it might include in search results.
A whitepaper from Google, Sitemaps: Above and Beyond the Crawl of Duty (pdf), examines the effectiveness of XML sitemaps, which Google announced as an experiment called Google Sitemaps in 2005. The experiment seems to have been a success.
XML sitemaps are a way for web site owners to help the search engine index pages on their web sites, through the use of an xml Sitemap. Yahoo and Microsoft joined Google in adding support for XML sitemaps not long after, and a set of pages explaining the sitemaps protocol was launched.
The paper tells us that approximately 35 million websites publish XML sitemaps, as of October 2008, providing data for several billion URLs. While XML sitemaps have been adopted by a large number of sites, we haven’t had much information from any of the search engines on how helpful those sitemaps have been, how they might be used together with web crawling programs, and if they make a difference in how many pages get indexed, and how quickly.
Continue reading “Google Study Shows Use of XML Sitemaps Helps Index Fresh Content Quicker”
Many tasks are trivial for humans but continue to challenge even the most sophisticated computer programs. Traditional computational approaches to solving such problems focus on improving artificial intelligence algorithms. Here, we advocate a different approach: the constructive channeling of human brainpower through computer games. Toward this goal, we present general design principles for the development and evaluation of a class of games we call â€œgames with a purpose,â€ or GWAPs, in which people, as a side effect of playing, perform tasks computers are unable to perform.
– Designing Games With A Purpose
A paper from Yahoo researchers, Thumbs-Up: A Game for Playing to Rank Search Results, describes a game that they developed and tested internally at Yahoo to allow participants to compete against each other in ranking how relevant pages are for specific search queries.
Continue reading “Is Game Playing the Future Ranking System for Search Results?”
Is there any value in using keywords in the URLs of web pages? Would a search engine look at keywords that you might include in the addresses of your pages, and associate those keywords with the content of your pages in the search engine’s index?
If so, how would a search engine go about looking at the web addresses indicated in the URLs to your pages, and break them down into meaningful parts to identify keywords?
Breaking URLs down into parts may also play a role in how the pages of a web site might be crawled by a search engine.
A newly published Yahoo patent application gives us some ideas on how it might extract keywords from the URLs of pages, and rank them, as well as using information uncovered in the process to determine which pages to crawl first from a web site.
Continue reading “Do Search Engines Look at Keywords in URLs?”
When a search engine indexes pages and other documents on the web, hoping to provide meaningful and relevant results to searchers, it doesn’t just rely upon the content found on web pages, but also considers the quality and quantity of of links pointing to those pages.
A search engine like Google might determine that a page is relevant to a specific query based upon the content found on that page, and the anchor text found in links pointing to the page.
It might also look at what it considers “relationships” between pages by looking at how pages are linked to each other. PageRank is one method of viewing those links that Google states that it uses, and assigning a measure of importance to pages that are linked to from other pages. This measure, or rank might be simplified as a probability that someone might arrive at a certain page if they are arbitrarily and randomly clicking on links on pages that they’ve surfed.
Continue reading “Google Patent Granted on Web Link Spam”
Why does Google customize some search results based upon a previous query that you’ve performed? Is there a special relationship between those query terms, and if so, how did Google define that relationship?
Imagine searching for “luxury car” at Google, and then performing another search for “infiniti.” On the second search, you find a page in the search results that looks like it will provide you with information that you are looking for, and you select a page.
Now imagine that a number of other people perform the same series of searches and select the same page.
It’s possible that Google might start considering the search for “luxury car” and the search for “infiniti” to be related queries. It’s also possible that the page selected in the second search for “infiniti” might start ranking more highly for the query “luxury car.”
Continue reading “How Searchers’ Queries Might Influence Customized Google Search Results”
What concepts does your website cover?
A search engine might look at phrases that you use on your pages to get an idea of the concepts covered by your site.
The search engine might try to decide that certain phrases you use are the “top phrases” that describe topics or concepts about your site.
But what if the search engine is wrong?
What if those top phrases don’t reflect the content of your site accurately? What if some other phrases more meaningfully indicate what your site is about?
If a search engine assigned phrases to your site which might affect the way that your pages are being presented to searchers in responses to queries at the search engine, would you want the search engine to give you the chance to make changes to those phrases that they think your site is about?
New Google Phrase-Based Indexing Patent Filing
Continue reading “What are the Top Phrases for Your Website?”