Googlebot Doesn’t Read Pictures of Text During Web Crawls
When I was an Administrator at Cre8asiteforums (2002-2007), one of my favorite forums on the site was one called the Website Hospital. People would come with their sites and questions about how they could improve them. One problem that often appeared was people having problems being found in search results for their sites for geographically related queries. One symptom for many sites experiencing that problem was that the only time the address of their business appeared on the site was in pictures of text, rather than actual text. This can be a problem when it comes to Google indexing that information. Google tells us they like text, and can have troubles indexing content found within images:
Google’s web crawler couldn’t read pictures of text, and Google wasn’t indexing that location information for their sites’ because of that. Site owners were often happy to find out that they just needed to include the address of their business in text, so that Google could crawl and index that information, and make it more likely that they could be found for their location.
Under this new patent, Google adds a diversified set of trusted pages to act as seed sites. When calculating rankings for pages. Google would calculate a distance from the seed pages to the pages being ranked. A use of a trusted set of seed sites may sound a little like the TrustRank approach developed by Stanford and Yahoo a few years ago as described in Combating Web Spam with TrustRank (pdf). I don’t know what role, if any, the Yahoo paper had on the development of the approach in this patent application, but there seems to be some similarities.
An authoritative user is a user of one or more computer-implemented services (e.g., a social networking service) that has been determined to be authoritative (e.g., an expert) on one or more topics that can be associated with one or more queries
I read the patent Tuesday, and thought to revisit it after reading a post this morning by Mark Traphagen at Moz, titled Will Google Bring Back Google Authorship? It’s a good question and Mark brings up a fair amount of evidence to support the idea that they might bring back the concept of author authority in search results, even if they don’t bring back or rely upon authorship markup (adding a rel=”author” to a link to your Google+ profile from a page you write at, or linking to pages you contribute to from your Google+ profile). As Mark notes:
A few years ago, I wrote the following about post about Google’s OneBox Patent Application I was brought back to it, with a new Google patent that looks at answering questions within similar answer boxes, and showing rich content, like in the example below:
A patent filed by Google a couple of years ago and granted today takes another look at Oneboxes, and includes this statement early on:
A search engine provider, Google Inc. of Mountain View, Calif., has developed an “answer box” technology, known as OneBox, that has been available for several years. Using this technology, a set of web search features are offered that provide a quick and easy way for a search engine to provide users with information that is relevant to, or that answers, their search query. For example, a search engine may respond to a search query regarding everyday essential information, reference tools, trip planning information, or other information by returning, as the first search result, information responsive to the search query, instead of providing a link and a snippet for each of a number of relevant web pages that may contain information.
I was excited to see a Google Patent granted this past Thursday, which describes how Google may rank pages in part based upon user feedback (clicks) in response to rankings for those pages. The patent tells us that this kind of identifying of a user’s needs and determining which documents are returned that might be most useful to a searcher can involve “a fair amount of mind-reading—inferring from various clues what the user wants.” But, we’ve been told recently by a Google Spokesperson that such clues can be misleading. I thought it was still worth pointing the patent out.
Some clues may be user specific, the patent authors tell us, and when a searcher searches from a mobile device, and Google know the location of that device, the results returned “can result in much better search results for such a user.” That does make sense.
Barbara and I have been looking at a lot of patents while preparing for the presentation, and one of the topic areas that we were going to discuss was Quality Scores, since one of the patents that mentions adding “Buy Now” buttons to paid search listings in search results, may do so only if the sites being considered to show buy now buttons have a high enough Quality Score associated with them.
While preparing, Barbara pointed out another patent to me that focuses upon low quality scores. It describes how a site might lose traffic if ranking scores for links pointed to it are below a certain threshold.
Added 6-17-2015 – It’s not clear from the new patent filings, but from feedback I received on Twitter from Mathieu Janin, at https://twitter.com/Matt_Refeo, it appears that Google may be showing sitelinks for pages that aren’t just the home pages of a site. As Mathieu tweeted to me:
I performed this query again on the french version of Google, and it is showing sitelinks for an internal page on the site: