Books, Articles, and Papers
Some papers, books, and articles that I found online and wanted to share.
As We May Think
by Vannevar Bush
In July of 1945, Vannevar Bush speculated what scientists who had worked on the war effort should turn their hands to next, to make the world a better place. His article urged scientists to focus upon making knowledge more accessible to everyone, and came up with an idea that in many ways foreshadowed the emergence of the internet.
Published in Science on July 15, 1955, Eugene Garfield proposes a citation index to scientific articles, in many ways like the legal Shepard’s Citation, which helps lawyers and legal scholars in US State and Federal Courts find publications and court cases that refer to other cases. Eugene Garfield’s work on citation analysis had an influence on how links are considered as citations in algorithms such as PageRank.
Improved Text Searching in Hypertext Systems (pdf)
by Lawrence Page
The first PageRank patent, filed by Lawrence Page with the USPTO on January 10, 1997. A plain language description of PageRank and the Backrub search engine in a provisional patent filing that was never actually assigned or published in the patents database, and which provides a comparison of Backrub with other search engines of the time.
Hypersearching the Web
by Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, Jon M. Kleinberg, and David Gibson
IBM’s CLEVER Project explored how analyzing links between pages could be useful in indexing the Web, around the same time that Google was developing its PageRank approach. While the team never publicly released a search engine, many of the concepts they developed were used by Teoma/Ask Jeeves. This paper describes the concepts of “Authorities” and “Hubs” within a collection of pages for a query on a specific topic, which are used to refer to how some pages are linked to by many other pages, and other pages link out to many other pages.
The Semantic Web
by Tim Berners-Lee, James Hendler, and Ora Lassila
An effort to help computers better understand content and data on the Web, and enable it to be shared widely and quickly. This is one of the first and one of the most well known papers about the Semantic Web.
by R. Guha (IBM Research), Rob McCool (Knowledge Systems Lab), and Eric Miller (W3C/MIT)
A look at some of the early challenges on the Semantic Web, differing from crawling the Web of pages, to collect information from the Web of Data. This includes a look at Documents vs Real World Objects, Human vs Machine Readable Information, and the Relation between the HTML & Semantic Web.
Claude Hopkins published this classic book on advertising in 1923, and it’s still very relevant for today’s world of online marketing and advertising.
Introduction to Information Retrieval
by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
A thoughtful look at how search works from a computer science perspective. Highly recommended for those who like to delve into the science behind search.
Search User Interfaces
by Marti Hearst
A very readable and very informative book that approaches how search engines work not from the algorithms behind the scenes, but rather the interfaces that you see when you search. If you want to learn a lot about how search engines work quickly, this is a great place to start.
The Anatomy of a Large-Scale Hypertextual Web Search Engine (pdf)
by Sergey Brin and Lawrence Page
One of the very first white papers that provided a glimpse into how a commercial search engine works. The search engine in question in Google, and even though this paper was written more than a decade ago and provides some great historical perspective on Google and search, there are hints in it of things to come from the search engine.
The PageRank Citation Ranking: Bringing Order to the Web
by Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd
If you’ve heard of Google, chances are you’ve also heard about PageRank, which is a method that the search engine used to rank how important pages are on the Web, and which has been combined with other ranking signals to determine the order of pages you see when you search. It’s very likely that the PageRank of 1998, as described in this paper, has evolved over the last decade, but it’s worth reading about how it was intended to work in the early days.
Shaping the Web: Why the politics of search engines matters (pdf)
by Lucas D. Introna and Helen Nissenbaum
Published in 2000, this paper looks at the potential biases in which search engines may engage, arising not so much from technical issues, but rather political ones. Why might some types of sites be excluded from search results while others might be favored? A thoughtful criticism of popularity-based search algorithms and the purchasing of prominence in search results.
Published in 1989, this paper discusses a different kind of search interface than what often gets discussed in Information Retrieval circles, where a single search is often part of a multiple page and multiple query inquiry for information. A thoughtful paper that might have you thinking about designing web pages a little differently.
This set of usability guidelines from the Department of Health and Human Services are helpful, creative, and smart. If you design web sites, and you haven’t seen them, you should take a look. You might get some ideas on how to make your sites more usable for visitors.
Published in 2009, this online version of the book provides a great first look at the computer science behind how search engines work.
Helpful Government Sites
Search Engine Resources
- Google Webmaster Guidelines
- Google Webmaster Central
- Yahoo Search Help
- Yahoo! Search Content Quality Guidelines
- Webmaster Central – Bing
Search Data Related Blogs
Web Dragons: Inside the Myths of Search Engine Technology
by Ian H. Witten, Marco Gori, and Teresa Numerico
Search Engines: Information Retrieval in Practice
by Bruce Croft, Donald Metzler, and Trevor Strohman
Mining the Web, Discovering Knowledge from Hypertext Data
by Soumen Chakrabarti
Ambient Findability: What We Find Changes Who We Become
by Peter Morville
Algorithms of the Intelligent Web
by Haralambos Marmanis and Dmitry Babenko
Search Patterns: Design for Discovery
by Peter Morville and Jeffery Callender
Information Retrieval: Implementing and Evaluating Search Engines
by Stefan Buttcher, Charles L.A. Clarke, and Gordon V. Cormack
Letting Go of the Words, Writing Web Content that Works
by Janice (Ginny) Redish
Don’t Make Me Think
by Steve Krug