Searching in Google’s Book Search (The SEO of Books?)

Unlike Web pages, there are no links in books for Google to index and use to calculate PageRank. There’s no anchor text in links to use as if it were meta data about pages being pointed towards. Books aren’t broken down into separate pages that have a somewhat independent existence of their own the way that Web pages do, with unique title elements and meta descriptions and headings. There isn’t a structure of internal links in a book, with file and folder names between pages or sections that a search engine might used to try to understand and classify different sections of a book, like it might with a website.

An image of a boy reading

A Google patent granted today describes some of the methods that Google might follow to index content found in books that people might search for. It’s probably not hard for the search engine to perform simple text based matching to find a specific passage that might be mentioned in a book. It’s probably also not hard to find all of the books that include a term or phrase in their title or text or which were written by a specific author. But how do you rank those? How do you decide which to show first, and which should follow?

Google was granted a patent on Query-independent entity importance in books today, originally filed on July 25, 2010. The inventors include David Petrou, Chiu-Ki Chan, Daniel Loreto, Jeffrey C. Reynar, and Nikola Jevtic.

Google’s indexing of books explores and collects information about entities, or specific people, places, dates, events and things mentioned in those books.

An importance score might be created about each of those entities based upon a number of factors, such as:

1. How much information about a specific entity is included in the book and where

The patent tells us that the appearance of an entity in different sections of a book may influence how much weight each entity might carry, such as inclusion of the entity in places like:

  • Front and back covers,
  • Book flap,
  • Copyright page,
  • Table of contents,
  • Forward or afterward,
  • Index,
  • Reference section,
  • Chapter heading,
  • Chapters,
  • Special pages within chapters (such as the first page of the chapter), and
  • Atypical pages (e.g., such as pages that do not contain much text).

The patent does provide some hints as to which locations might carry more weight (such as a mention in the first part of the first chapter being very significant) and which parts might carry much less weight, such as in the copyright notice.

2. Whether there are third party references point to a particular book and to its mention of specific entities.

These references can include things such as:

  • Book reviews,
  • “About the book” information,
  • Book citations,
  • Scholarly citations, and
  • World-Wide Web references

If these references are frequently referenced themselves elsewhere, they may carry more weight. As we’re told in the patent:

For example, if a scholarly article cites a particular chapter of a book, and the article mentions an entity that is also mentioned in the cited chapter, the references module elevates the importance score of the entity.

In one embodiment, the third party references considered by the references module have a greater influence on the importance score than the intra-book references considered by the book context module. Third-party references are considered less partial and, therefore, are considered better signals of the importance of a section or entity in the book.

3. Whether or not the sections of a book that include that entity are accessed more than other sections of the book

If people search for the book online and access different parts of it, what do they look at? Are there some sections that get visited more than others? If so, what are those?

4. How frequently that entity is mentioned in the book compared to how frequently the entity is mentioned in other books in the collection of books.

So, for instance, a book that mentions New York City more frequently than other books that mention New York City might be seen as having a higher importance score for the entity “New York City.”

Other Ranking Considerations

The patent also describes some different ways that this kind of information might be presented, such as showing location information on a map, or event information on a timeline, or facts about a person in text or a table.

A search engine might also look at metadata about books that it indexes, which is often presented as structured data such as Author’s name, publisher, year published, number of pages, edition, Dewey Decimal Classification, Library of Congress classification, ISBN number, and more.

Other query independent factors that a search engine might consider can include a sales volume for a book, or current position on a best seller’s list.

An overall ranking for a book in response to a query might include both these query independent scores as well as query dependent signals, such as the number of terms in a query that match those in a book, synonym matching, and other information retrieval techniques.

Conclusion

I’m a believer in the idea that if you want to understand something well, you need to be able to step a little outside of it and look at it from a different perspective. If you want to be a good writer, it helps to learn a differnt language and see how its rules and manners of expression differ from your own. If you want to learn about how an operating system works well, it helps to install and learn about a different operating system so that you can understand the similarities and differences between the two.

I don’t think that people will start thinking about these types of “ranking signals” when they start drafting a book anytime soon. I have brushed over a number of the approaches that Google’s patent describes on how they might rank books in a book search. I pointed out in the beginning of this post a number of the differences between books and web pages, and how those might make ranking those different. But there are also many similarities, and it helps to understand both the how and why of those differences.

Google has been using information extraction approaches to collecting information about entities that it finds on the Web, and it wouldn’t be surprising to see more of the ideas behind how books might be indexed flowing over into how web pages and websites are indexed.

Share

21 thoughts on “Searching in Google’s Book Search (The SEO of Books?)”

  1. Interestingly, a lot of the factors that Google would use to rank books in “book search queries” seem to be the same factors that webmasters would concentrate on when considering on-page SEO such as “book title-site title”, “chapter title-page title”, “keyword density-keyword density”. IMO, book search seems like a natural extension of web-page search in almost every way.

    Funny, many authors will end up being online publishers without ever having launched a site…:) Authors may eventually start doing keyword research before writing…;)

    Mark

  2. It seems like the upcoming era is of the Book optimization and writers after pitching the idea need to do the keyword research and come up with the list of keywords that they have to use in Book title, Authors introduction, content of book and others….

    Obviously, when well optimized books will be coming on the internet (where there is a good there is an evil) there will be spammers and then may be sooner or later people have to collect more reviews for book and reviews from authentic sites and much more list of algo that Google can use for their search engine for books… the only thing that bug me is the quality of creative writing which is going to affect because of all this…

  3. Fascinating read Bill. When I read this post, I couldn’t help but think of the nature of both books and webpages as ‘documents.’ When I thought about it this way, I realized that books are like stone tablets and web pages are dynamic documents. While thinking thus, I realized that book optimization without links is a very different ball game, it will be interesting to see what the future holds.

  4. Hi Mark,

    There are a few major differences between indexing books and indexing web pages. Once a book is completed, it rarely changes in anyway. And, while one book may mention another or share different entities – specific people, places, things, events, dates, etc., there are no express links from one book to another.

    There are aspects of book search and web search that can share a lot of the technology behind indexing, and I agree that it’s worth exploring those similarities.

    I’m not sure how much “express” keyword research the author of a book should or might do, but if they want to write a book tht develops any audience at all, they should write something that will appeal to an audience, and use words in their books that people interested in what they write about will expect to see.

    For instance, if I write a book about Bob Dylan, chances are that I’m going to note that he changed his name from Robert Zimmerman, I’ll probably include the names of many of the albums he released, and the names of a good number of his songs. I may include information about the people he performed with, and other people who were vital to his career. I could mention some of the pivotable performances that he gave and where and when those took place. I’d definitely be doing a lot of research, though I might not do most of it in Google’s Keyword Selection Tool. Many of the things I am mentioning here are definitely “entities” though – specific people, places, things, dates, and events. And these seem to be the kinds of things that Google’s book search seems intent on capturing.

  5. Hi Moosa,

    It’s interesting to watch the growth of ebooks, and that many books may now make it to the web without having ever been published in print at all.

    I hope we don’t start seeing an era of books with exact keyword titles in the future (though many nonfiction books do try to describe very precisely what they might be about through their titles).

    It was a little surprising and interesting to see the patent mention that some books may rank better in a book search based upon things like online reviews about those books.

    I hope the quality of books don’t suffer because authors focus more upon how findable their books might be online instead of how engaging or interesting or helpful they might be.

  6. Hi Jey,

    That’s a great analogy. I think there’s some merit to comparing books to stone tablets, though a good book can influence other writers, inspire responses through online and offline writings, and be the start of a conversation that gets played out in many places.

    Inspite of the differences between the indexing of books and the indexing of web pages, I can’t help but wonder how Google’s analysis of language in books might influence rankings on Web pages. If Google indexed all the books in the world that they can, and created an incredible index of all of these different types of entities it finds within them, and the language associated with those entities, will that have implications for web search?

  7. Are you saying that books might show up in Google results? Or do you think books will still have a separate, proprietary search? This will be very interesting to see how Google integrate real time search with one of the oldest mediums of information… the book. The past 2 years they have been pushing for integrating more “real time updates within their searches” and I think this is a definitely a step in the opposite direction.

    Although, as with everything the success of it relies heavily on it’s implementations.

  8. Hi Matt,

    My post isn’t about books showing up in Google’s regular Web search results, though Google will sometimes show book search results in Google’s organic web results through Universal Search. See my post How Google Universal Search and Blended Results May Work and some of the links within it for some ideas on how that might work.

    My post is about Google’s patent that describes how they index different types of entities that appear in books, and how Google might rank those books for those entities – specific people, places, events, dates, and things. If Google thinks that book results and real time results are both relevant enough to include in Web search results, we may see both. I don’t believe that’s a problem in and of itself.

    For example, let’s say that an author of a number of books is being discussed on social networks, and someone searches for the author’s name in Google’s main web search. Google may return real time social results and book results along with web results. I think showing both would be a good thing for searchers.

  9. If books are released only once and it takes time before there is a new edition, it is in the best of the authors’ interests to be careful about how to write the books, do proper research and make choices, depending of their motives. If they don’t mind about it and just want to write a book like they would want to do it as if Google didn’t exist and release it without questions, it is their choice and it would be some removed weight on their thinking time. I would find it interesting to see how they will readjust their ranking with the books.

  10. I think Google is at the top of its game with anything SEO related. It has been and continues to be the best search i have used personally for finding books, documents and other webpages. Nothing but the best from Google!

  11. Hi PerceptionRedemption,

    Hopefully most of the motivation behind writing a new edition of a book is that there’s new information available, some of the old information or examples might be dated, and people are looking for a new version – there’s actually a demand for it.

    I do know that there are at least some authors out there who are aware that search engines may index the content of their pages, and that search engines may be helpful in reaching and expanding the audience for their books.

  12. Hi Jamie,

    Google’s focus is as much on meeting the demand of searchers, and their interests, and how they look for information. That’s part of the reason why we see them doing more with real time social search, for one example. Another is doing things like enabling people to do searches based upon images that people might create with the cameras on their phones, given the increase in mobile searching and in phones that have cameras. It makes things interesting.

  13. I wonder if ‘book links’ will be the new SEO gold dust as Google begins indexing literature – the highest form of editorial citation!

    As webmasters, I would appreciate being able to tag links with specific types of contextual elements based on how I used the source page, ex: citation, supporting point, counter-point, etc. For printed works that are being indexed, perhaps authors could use a character that followed the inline citation (ex. *^~) to classify the type of reference.

    It will be interesting to see if changes are made to the traditional referencing styles like MLA and APA to better comply with the digital age.

  14. Great read, I love this post because it really makes you see how far indexing has advanced in its many forms from basic libraries to the huge beast that is google

  15. Hi Dan,

    Searching through books has its own unique challenges, and it different enough from searching through the Web that I thought it was pretty interesting to think about the differences. Like you wrote, it shows how indexing has come, though I think it’s possible that some of the ideas around indexing books the way that Google is working at might not have been somethings that people where doing when they were working on indexing documents before the Web was around.

  16. I think that these ranking factors might work for non fiction (the piece offers an example of the nubmer of times ‘New York City’ is included within a text as a factor in assessing relevance to a particular place, and this will likely work very well for travel books or even history texts. I forsee more problems when trying to scientifically evaluate fiction however!

  17. Hi Simon,

    That’s something that I wrestled with a little as well. There’s definitely value to this approach when it comes to nonfiction, but is there any when it comes to fiction as well?

    I do think that there is, especially when an author includes details and information about people and places and things that people might be interested in finding and reading. Some fiction takes and makes locations or events as much of the star of their work as the fictional characters contained within them. Being able to find mentions of those entities and where they are located within a piece of fiction might have a different kind of value than when you are looking for purely factual information, but it still can be pretty valuable.

  18. I think that it is a fantastic idea to try to index books and rate them by a pagerank factor instead of just the ordinary Amazon reviews. You can tell a lot about a book by the amount of good content in it but that isn’t indexed like you have mentioned. Maybe if they used Captcha solving to put together books in a web-based format this would be easier (isn’t this what they’re doing now?) I would love to see something like this happen in the future.

    By the way responding to Simon Jones I think there must be a way to evaluate fiction. Fiction has quality just like non-fiction does. That’s why Amazon reviews are usually accurate. People rate good books when they are good and that means there is some sort of internal denominator in good fiction. Maybe they could map out characte development, enjoyment of reading it etc. We’ll see what happens in the future.

    Thanks Simon.

  19. Hi Ryan,

    Since books don’t actually link to each other, we do have to throw actual PageRank out of the equation completely, but books citing other books is similar in a few ways that could be useful. And the fact that Google has taken this on means that they can learn how to work with webpages that might cite something without linking to it.

    Being able to index the content of books electronically, analyze what entities appear within them, learn what terms and concepts tend to co-occur within them, and use those types of factors to rank books provides ideas on how to index other content on the web (even pages with links) a little differently.

    I’m not convinced that the indexing of nonfiction is any easier or more difficult than the ranking of fiction.

Comments are closed.