One of the challenges of the Google Book Search project has been to find a way to index all of the books included within the project.
We don’t know the details of the technology used to index those books. A little research uncovers some interesting information.
A post at Search Science this November involved the award of a grant of $107,112 by Google to Rada Mihalcea. Xan Porter noted there that Professor Mihalcea’s research involving “automatic extraction methods to retrieve significant information in books stored in electronic format” is what likely interested Google in getting her help for Google Print, or Google Book Search, as it is known now.
It’s impossible to tell whether or not textrank is what is being used to index those books.
If you are familiar with pagerank and Kleinberg’s Hyperlinked Induced Topic Search (HITS), and Wordnet, you might find the patent application’s description of how textrank would work interesting.