Just what was the first search engine?

Some recent research I’ve been doing had me looking at the Infoseek search engine, and its part in the history of search engines. I remembered an old book I have on search engines which has a couple of chapters on Infoseek, and started to reread it.

The book is the Web Developer.com Guide to Search Engines, from February of 1998. It’s been a while since I’ve picked up a book about search engines which hasn’t mentioned Google. This one focuses upon the search engines on the web at that time, and on adding a search feature to your site.

I didn’t get much past the first section of the first chapter of the book, titled Bow Down and Give Thanks to Archie, before I hopped on the web and started looking at Archie’s role on the net. As it notes there:

The grandfather of all search engines was Archie, created in 1990 by Alan Emtage, a student at McGill University in Montreal.

While the chapter gives credit to Archie as the first search engine, it really doesn’t go into too much detail about what it was, and what it did. So I decided to look around a little on the web for more about what is commonly credited as the Grandfather of modern search engines.

A usenet posting from September of 1990, by Alan Emtage, referred to Archie as “pretty brain-damaged” and perhaps it was back then. But it does seem to have been the best way to find information from other servers around the internet at the time. A post from three years later, again by Alan Emtage, showed a little more confidence in the abilities of Archie. It also described a template indexing method that would help Archie index “freely available or Public Domain documents, images, sounds and services on the network.” In some ways, maybe this isn’t too different from today’s Google Sitemap program. A 1993 article, Life Before (And After) Archie, describes the commercialization of Archie, incorporating it with other services to help people find information on the internet.

Skeptical as I am, I wondered if Archie was really the internet’s first search engine. It might be, but asking the question and searching around a little led to this article which described a search process being considered carefully by the Royal Insurance Group and their partner, Hewlett Packard:

The First Search Engine: An Untold Story

It’s an interesting tale, but more of a “what could have been” type story. The search landscape would probably be very different if the technology described in this article was developed further.

So, how did Archie originally work? Well, it definitely didn’t have the capacities of today’s search engines, but it did allow you to do look around the internet if you knew the name of a file you might be looking for. Archie didn’t index the content of text files. That capability came in 1991 with the development of another search, known as Gopher.

A paper from 1992, A Comparison of Internet Resource Discovery Approaches, looks at some of the early indexing programs on the web, including Archie, and a standard for searching called X.500.

X.500 was a “a distributed directory service standard” developed by The Consultative Committee on International Telephony and Telegraphy (merged into the International Telecommunications Union in 1992) and the International Organization for Standardization (ISO). However, this standard doesn’t appear to allow the type of searches that Archie did, and it required much more work on the part of the hosts of files.

Whois was also around before Archie, but looked at people, network numbers and domains on the Internet. It was more of an information directory about the net, which could be searched, than a way to find files on the internet. The paper describes some other interesting early directory and search mechanisms.

Chapter 5, from the book The daemon, the gnu, and the penguin: A History of Free and Open Source, tells us a little about the size and scope of Archie: “In 1992 it contained about 2.6 million files with 150 gigabytes of information.” For the time, that was pretty significant. A paper from 1993, Research Problems for Scalable Internet Resource Discovery (pdf), tells us that Archie was pretty active then, but seeing some signs of strain in handling searches:

The global collection of Archie servers process approximately 50,000 queries per day, generated by a few thousand users worldwide. Every month or two of Internet growth requires yet another replica of Archie. A dozen Archie servers now replicate a continuously evolving 150 MB database of 2.1 million records. While it responds in seconds on a Saturday night, it can take five minutes to several hours to answer simple queries during a weekday afternoon.

Of course, the popularity of the World Wide Web changed lots of things. One early method of indexing the web, created by Martijn Koster who was one of the chief architects of the Standard for Robots Exclusion, was ALIWEB. The name is short for Archie-Like Indexing in the Web. ALIWEB didn’t quite take off the way other search engines would, but Martijn Koster’s work on robots would become an important part of those future search engines’ growth.

It’s going to take a while to get through that book if I keep getting sidetracked like this. That doesn’t seem to be a bad thing though.

Share

8 thoughts on “Just what was the first search engine?”

  1. Thanks for the story Bill. For a kid like me, it’s useful to know what was going on before 2000, when I first used Google.

  2. Thank you, Nadir.

    I didn’t start getting heavily involved with computers until 1994, when a couple of friends started showing me how to build them. That led me to going online in 1995, and starting building web pages in 1996.

    So, I really didn’t get too involved with Archie, or Gopher, or many of those other ways of interacting with the net that were more common before the web.

    I do think it does pay to know some of this history. Some of the ideas that we see show up in patent applications and patents these days aren’t as new as we might think. I like these lines from Danny Sullivan from an article he wrote in 2001:

    Most important is the fact that our current group of search engines all use their own different types of technologies to generate results, and many have patents on the exact techniques they use. That hasn’t prevented other search engines from coming up with their own techniques. For example, Direct Hit has patents relating to the use of clickthrough measurements to improve results. That hasn’t stopped Inktomi, Yahoo and others from tracking clicks.

    More on the controversy described in Danny’s article here: Search Engine Creator: AltaVista Patents Bogus.

  3. Pingback: Words and Software
  4. That’s fascinating, Bill. Just think with the right happenstances, we might all be Archie-ing to find things now rather than Googling. .. and it happened here at McGill. Sounds like a theme I might develop. :)

  5. The internet would be a very different place, wouldn’t it? There are so many competiting technologies being developed online, and you never quite know which one will capture the imagination of the public, and take off. I’ll look forward to seeing something you might write if you do come out with an article on the subject.

  6. I became involved with commercial printing computers in the late 70s (DEC 8s) with huge hard drives that held little, but it wasn’t till 1996 that I owned a PC and the first pentiums could outdo all that I had seen with our huge systems. My first search engine was archie and I thougt it was fabulous that I could retrieve information that was extensive and informative without of course commercial influence. Now when I search for early search engines archie is not included and I don’t know why, it was the one my internet provider suggested

  7. Hi Brian,

    The web has taken over, and archie just doesn’t hold the place it once had. I guess that since it doesn’t use the web, most people don’t include it in amongst the early search engines.

    Don’t know if you’ve seen this, but for noncommercial searches, you could try Yahoo’s Mindset.

    Or even give Google’s advanced search operators a shot and and limit the domains that you are searching to to .gov or .edu searches using the “site” command, like this:

    search phrase site:.edu

    search phrase site:.gov

  8. Pingback: Punkstar’s Pad - A Webpreneur’s Blog » Blog Archive » Use, the Improve

Comments are closed.