Some recent research I’ve been doing had me looking at the Infoseek search engine, and its part in the history of search engines. I remembered an old book I have on search engines which has a couple of chapters on Infoseek, and started to reread it.
The book is the Web Developer.com Guide to Search Engines, from February of 1998. It’s been a while since I’ve picked up a book about search engines which hasn’t mentioned Google. This one focuses upon the search engines on the web at that time, and on adding a search feature to your site.
I didn’t get much past the first section of the first chapter of the book, titled Bow Down and Give Thanks to Archie, before I hopped on the web and started looking at Archie’s role on the net. As it notes there:
The grandfather of all search engines was Archie, created in 1990 by Alan Emtage, a student at McGill University in Montreal.
While the chapter gives credit to Archie as the first search engine, it really doesn’t go into too much detail about what it was, and what it did. So I decided to look around a little on the web for more about what is commonly credited as the Grandfather of modern search engines.
A usenet posting from September of 1990, by Alan Emtage, referred to Archie as “pretty brain-damaged” and perhaps it was back then. But it does seem to have been the best way to find information from other servers around the internet at the time. A post from three years later, again by Alan Emtage, showed a little more confidence in the abilities of Archie. It also described a template indexing method that would help Archie index “freely available or Public Domain documents, images, sounds and services on the network.” In some ways, maybe this isn’t too different from today’s Google Sitemap program. A 1993 article, Life Before (And After) Archie, describes the commercialization of Archie, incorporating it with other services to help people find information on the internet.
Skeptical as I am, I wondered if Archie was really the internet’s first search engine. It might be, but asking the question and searching around a little led to this article which described a search process being considered carefully by the Royal Insurance Group and their partner, Hewlett Packard:
It’s an interesting tale, but more of a “what could have been” type story. The search landscape would probably be very different if the technology described in this article was developed further.
So, how did Archie originally work? Well, it definitely didn’t have the capacities of today’s search engines, but it did allow you to do look around the internet if you knew the name of a file you might be looking for. Archie didn’t index the content of text files. That capability came in 1991 with the development of another search, known as Gopher.
A paper from 1992, A Comparison of Internet Resource Discovery Approaches, looks at some of the early indexing programs on the web, including Archie, and a standard for searching called X.500.
X.500 was a “a distributed directory service standard” developed by The Consultative Committee on International Telephony and Telegraphy (merged into the International Telecommunications Union in 1992) and the International Organization for Standardization (ISO). However, this standard doesn’t appear to allow the type of searches that Archie did, and it required much more work on the part of the hosts of files.
Whois was also around before Archie, but looked at people, network numbers and domains on the Internet. It was more of an information directory about the net, which could be searched, than a way to find files on the internet. The paper describes some other interesting early directory and search mechanisms.
Chapter 5, from the book The daemon, the gnu, and the penguin: A History of Free and Open Source, tells us a little about the size and scope of Archie: “In 1992 it contained about 2.6 million files with 150 gigabytes of information.” For the time, that was pretty significant. A paper from 1993, Research Problems for Scalable Internet Resource Discovery (pdf), tells us that Archie was pretty active then, but seeing some signs of strain in handling searches:
The global collection of Archie servers process approximately 50,000 queries per day, generated by a few thousand users worldwide. Every month or two of Internet growth requires yet another replica of Archie. A dozen Archie servers now replicate a continuously evolving 150 MB database of 2.1 million records. While it responds in seconds on a Saturday night, it can take five minutes to several hours to answer simple queries during a weekday afternoon.
Of course, the popularity of the World Wide Web changed lots of things. One early method of indexing the web, created by Martijn Koster who was one of the chief architects of the Standard for Robots Exclusion, was ALIWEB. The name is short for Archie-Like Indexing in the Web. ALIWEB didn’t quite take off the way other search engines would, but Martijn Koster’s work on robots would become an important part of those future search engines’ growth.
It’s going to take a while to get through that book if I keep getting sidetracked like this. That doesn’t seem to be a bad thing though.