Search engines use programs to crawl the web, and identify new pages and newly updated pages to include in their index. These are often referred to as robots, or crawlers, or spiders. But there are other ways that the search engine gets information about pages that it might include in search results.
A whitepaper from Google, Sitemaps: Above and Beyond the Crawl of Duty (pdf), examines the effectiveness of XML sitemaps, which Google announced as an experiment called Google Sitemaps in 2005. The experiment seems to have been a success.
XML sitemaps are a way for web site owners to help the search engine index pages on their web sites, through the use of an xml Sitemap. Yahoo and Microsoft joined Google in adding support for XML sitemaps not long after, and a set of pages explaining the sitemaps protocol was launched.
The paper tells us that approximately 35 million websites publish XML sitemaps, as of October 2008, providing data for several billion URLs. While XML sitemaps have been adopted by a large number of sites, we haven’t had much information from any of the search engines on how helpful those sitemaps have been, how they might be used together with web crawling programs, and if they make a difference in how many pages get indexed, and how quickly.