Google described some of its janitors in a recent patent application.
Google has been working on extracting data from a wide variety of sources on the Web, but there are problems with a lot of that information. Some examples:
One site may use a certain format to present information, while other pages use different formats.
Information from one web page may contradict information from others.
Some data may become old and stale.
When Google collects this kind of information, a lot of it needs to be cleaned up, and Google’s “Janitors” spring into action to do that.