Humans aren’t the only ones who submit queries through search engines.
There are individuals and organizations that use search agents to find and collect information through search engines, focusing upon a number of different informational needs including search augmentation services that help searchers as they surf the web, and agents used by recommendation sites, metasearch engines, and shopping comparison sites.
A study from November, 2006, provides some details about search agents, and how their interactions may differ from human searchers, and what that might mean for search engines. The paper is on the fairly techical side, so I decided to look up a number of the search agents referred to in the document, which are kind of interesting in their diversity and multiple purposes.
I listed and linked to a number of the agents, or to papers that describe them, and I’d recommend looking at a number of the agents to get a sense of their scope and uses before tackling the paper, which I’ll point to first.
The paper is written by Bernard J. Jansen and Tracy Mullen of the Pennsylvania State University, Amanda Spink of the University of Pittsburgh, and Jan Pedersen of Yahoo.
Thier research looked at three sets of queries and page views from search agents which used Excite and AltaVista from 1997 to 2002, encompassing around 900,000 queries from over 3,000 agents.
Findings include:
(1) agent sessions are extremely interactive, with sometimes hundreds of interactions per second
(2) agent queries are comparable to human searchers, with little use of query operators,
(3) Web agents are searching for a relatively limited variety of information, with only 18% of the terms used being unique, and;
(4) the duration of agent – Web search engine interaction typically spans several hours.
The paper doesn’t provide links to search agents that they discuss in the paper, so I looked up a number of them, so that we could see firsthand what kinds of uses these things might be making of search engines. Some of these point to pages for the agents themselves, while others are papers about how the agents work:
- WebWatcher
- CiteSeer
- MySpiders
- Marie-4 crawler
- The Remembrance Agent
- Letzia – Who’s That Actor? The InfoSip TV Agent (pdf)
- WebMate
- CorpusBuilder
- BASAR
- Ithaki (uses search agents)
- Dogpile (uses search agents)
The study is limited to data from queries submitted to Excite and Altavista between 1997 and 2003, and many of the agents studied were academic search agents, a number of which don’t look like they are active any more. It would be interesting to see some more modern data and results, though you have to wonder if Google or Yahoo or Microsoft or Ask would share this type of information with researchers.
Interesting, though. When we talk about user behavior in connection to queries performed on search engines, we aren’t always talking about users that are people.
Bill,
Are these user agents related to scraping, other indexing agents like spiders, educational research (like the links provided above) or are all included?
Do you have any ideas of the impact on the engines.
pittfall
Hi Steve,
The authors of the paper state that theirs is the first indepth research paper on the study of the behavior of search agents. It’s really hard to tell how many search agents there are out there, but I think that the uses are for some fairly widespread uses.
Here are a handful more:
1. Crime investigation.
FF POIROT (financial fraud prevention oriented information resources using ontology technology)
According to The taming of the sleuth – problems and potential of autonomous agents in crime investigation and prosecuting FF POIROT uses automated searches from search engines such as Google and Alta Vista as part of the processes it follows.
2. Personal Search Tools
Client based applications like:
Aware,
Copernic,
FirstStop Web Search,
PR infoFinder
3. Plagiarism Dectection and copyright infringement
Copyscape
Copyscape notes that it uses the Google API. I’m not sure if some of the personal search assistants are using APIs also.
Some other software that may use web searches to find information could include reputation managment software, intellectual property protection software, many types of vertical search engines (Google’s custom search engine feature shows how effective that could be), meta search engines, and scraping software.
Many sites don’t specifically state where they get their information, and whether or not they are using search agents that find information through search engines.
Does that mean that a high number of searches could be done by these search agents and thus creating false search data to those performing keyword research?
Hi Keyword Research,
A recent Yahoo patent application described some ways that they might use to distinguish visitors that were programs instead of human visitors, as they go about collecting user behavior data.
Those could involve looking at things like very high speeds of searches and patterns in the ways that things are searched for from specific searchers. It is possible that some keyword research tools on the market don’t make those distinctions, and that can impact the search data that you see when you are conducting research.
Some other changes in the way that search engines work may also make some of those tools less helpful than they might be to a researcher. For example, for some queries, a search engine might determine that there are some other phrases that are similar enough to include in search results for the original search phrase, and expand the results for that query, making it look like there are more results for a specific term than there really are.