How Google Data Centers may be Split between Regional and Global Data

A Google patent granted last week describes how the indexes at different Google data centers may contain pages that are indexed and classified as global, and pages that are indexed and classified as regional. Last summer, I wrote about how Google may predict which data center might provide the best results for a query. Google was also granted a number of patents last August that provided some insights into how Google’s Planet Scale Distributed Storage of Data may work.

A screenshot from Google's patent on regional indexes showing that different data centers contain both regional and global content.

Those patents from last summer give us an intriguing but incomplete look at the pages contained in Google’s data centers. The newly granted patent appears to fill in some significant gaps. Imagine that each data center might contain some unique pages and content that’s regional in nature, and some content that might be replicated across more than one data center that’s global in nature. The global content could potentially take up between 50% and 75% of storage area on each data center.

The process behind determining whether a page is regional or global involves the use of document classification scores. A global index includes documents that have a high quality score regardless of the locations of people who view them, including popularity metrics such as PageRank. This global content is considered to be world-wide.

Content that isn’t world-wide could be included within a particular index as regional content, and may be located within a regional index at a data center based upon being similar to characteristics of the queries received at that particular data center. For example, if 75% of web queries from Lithuania are in the Lithuanian language, then many of the pages within the data center for those searches may be in Lithuanian. Pages that are popular in Lithuanian that aren’t in the Lithuania language may also be included in the regional index for that data center if those pages aren’t popular enough elsewhere to be included in the global index.

Chances are that many or most queries performed by searchers may be sent to the data center nearest them to be responded to, but they don’t have to be, as discussed in the prediction patent that I linked to in the first paragraph.

It’s also possible that a query in Chinese sent to a US data center may be re-routed to a Chinese data center, or it could be processed by both a US and a Chinese data center. It appears that a choice has been made that instead of trying to replicate Google’s index everywhere, that it makes more sense to try to use a machine training approach to try to include regional content closest to where it might be needed most often.

This approach does provide some interesting and peculiar results, though. If I search at Google, in traditional Chinese characters for an auto mechanic while in Virginia, it looks like I might be getting results from China, even though I might be better off getting information about local car repair places near me in Virginia. Note that the location setting in the left sidebar in the image below is set at “Warrenton, Virginia.”

Google search results from a search in Traditional Chinese on a search from Warrenton Virginia, showing results that appear to be from a Chinese data center.

The new Google patent is:

Regional indexes
Invented by Gautham Thambidorai, Eisar A. Lipkovitz, Cosmos Nicolaou, and Li Fan
Assigned to Google Inc.
US Patent 8,131,712
Granted March 6, 2012
Filed: October 15, 2007

Abstract

A corpus of documents is identified, such as a large corpus of web documents. A quality score is applied to each, and at least some of the documents in the corpus of documents are identified based on their respective quality scores.

At least one query characteristic, for instance, the language of a query, associated with a plurality of search queries is identified. A subset of documents in the corpus of documents is identified that satisfy the at least one query characteristic. An index is built that includes the identified at least some documents and the identified subset of documents.

More Local Web Pages in Google Search Results

In a post from David Naylor’s web site a couple of days ago, titled The Biggest Change In SEO To Date?, David Whitehouse wrote about being surprised to see his site and some other sites from nearby businesses listed on the first page of Google on a search for [SEO], after usually seeing sites that were more globally popular for that query in the past. Those results aren’t Google Place pages blended into search results from Google Maps, but rather Web pages. He reset his location in Google to some other locations in the UK, and noticed that a range of pages on the first page of Google were being changed to reflect the new locations.

Back in 2009, I noticed one client ranking very well for a very generic, and hard to compete for term in the fourth position within Google’s search results, when my location was set to be near that client’s location. If I changed my location setting in Google, I would see another website in that same slot based upon the new location. While that “locally influenced ranking” persisted for a good number of months, it disappeared as quietly as it had started. If I try that same search now, I see a couple of web sites from local businesses showing up in those search results again, that change when I change my location.

Again, these aren’t Google Maps results blended into web search results, but rather web pages from businesses tied to the location listed in my Google search settings.

A Google Inside Search blog post from February 27th, Search quality highlights: 40 changes for February, noted a few “improvements” for local search results:

One seemed to be an improvement involving showing “local” web pages within web search results:

Improved local results. We launched a new system to find results from a user’s city more reliably. Now we’re better able to detect when both queries and documents are local to the user.

Another improvement involved showing more Google Places search results blended into web results when appropriate:

Improvements to ranking for local search results. [launch codename “Venice”] This improvement improves the triggering of Local Universal results by relying more on the ranking of our main search results as a signal.

And another involved showing more local relevant results in YouTube:

More locally relevant predictions in YouTube. [project codename “Suggest”] We’ve improved the ranking for predictions in YouTube to provide more locally relevant queries. For example, for the query [lady gaga in ] performed on the US version of YouTube, we might predict [lady gaga in times square], but for the same search performed on the Indian version of YouTube, we might predict [lady gaga in India].

Takeaways

Google does seem to want to show more local web results in response to queries in Web search, in addition to Google Maps results that may also be included with those results.

From a technical standpoint, it makes sense for Google to include more than one index at its data centers, with one index that includes world-wide content, and another that includes regional content frequently searched for at the nearest data center by searchers performing queries for that regional information. If a relatively rare search, like one for “automobile mechanics” written in Traditional Chinese, is performed in Virginia, it seems to make sense to re-route that query to a Chinese data center, though that might be a little far to take your car to get it repaired. :)

Google’s approach to including some more “regional” content in search results based upon location may or may not be related to the data center that a query takes place at, but it’s worth exploring Google’s recently returned emphasis on showing more local results in response to queries and the context of location from those searches.

Don’t confuse location with personalization either. If you missed it, an interview that Eric Enge conducted with Google’s Jack Menzel on personalization at Google is worth reading. Here’s a snippet:

Sometimes results that are really a result of context get misinterpreted by people as personalization. If I respond to your query in your language that is really about context, not personalization. Personalization is more about recognizing that I like Dominion the card game and you really like Dominion the power company, and someone else really likes a videogame called Dominion. Imagine you turned off personalization, and suddenly Google was responding to all of your queries in the wrong language, you would be like “oh come on”.

Are you seeing “local” web pages in web search results that you hadn’t been seeing before?

Share

36 thoughts on “How Google Data Centers may be Split between Regional and Global Data”

  1. La part croissante attribuée par Google aux résultats localisés (sous la forme d’une fiche entreprise Google Maps ou d’un affichage “mix” site internet et donnée localisée) souligne la nécessité pour une entreprise de d’optimiser son référencement local (d’autant que ce type de référencement répond également très bien à l’utilisation croissante des téléphones mobiles).

  2. I think this will work really well along with regional targeting in Google Webmaster Tools. This is probably a crazy idea but if Google wants to get good at local results then they need to provide a drill down option within GWT. For example, at the moment you can set a site to let’s say “UK” but what if a local business had the option to set its targeting to just “London”.

  3. Our business is 100% local and we welcome, yet are a little nervous of venice at the same time. Before 75% of our search terms were dominated by Places results. This limited our successes as we have more than 5 services, but we optimised all our service pages with a local twist to cover our bases. With Venice, this should help us improve traffic, but don’t you think Venice may help sites that optimise by location thats super specific – such as a certain area of a large city? Many of our competitors do this, throwing out weak content to do so. If Google is getting smarter at geo-locating searchers they may be rewarding this type of strategy when it actual fact it goes against the whole idea of offering quality, unique content

  4. Bill,

    I did a bit of digging into which ‘local’ sites seem to be getting promoted as part of the ‘Venice’ update over at http://www.epiphanysearch.co.uk/blog/google-venice-happy-days-for-local-businesses/ and was pretty surprised to see the limited number of metrics Google seems to be using to determine whether a site is worthy of promotion in the SERPs for a given location.

    Have you seen anything else from Google that might confirm / give more insight into how a site ‘qualifies’ for such a boost in the SERPs in this update, beyond datacenters?

  5. This is some great information. I really think google is trying to localize it’s search results and personally I thing it is a good idea. I like the idea that if I am looking for a pizza in Jupiter Florida I will at least have the option of maybe finding a local pizzeria before being slaughter with national brands like dominos, pizza hut, etc.

    Do you think that this localization will be determined not only by googles address of your location but perhaps by some on page schema programming? Do you think that would be in your favor to have that on your page?

    thanks

    sean

  6. In my opinion, Google local search results are a headache for SEO. If you aren’t running a local business website you can’t control how your keywords are ranking through all Google datacenters. Or you can do it, but it’s a hard work to manually check your rankings in all Google local sites.
    And by my experience, the search results may vary from one datacenter to other, but if they really want to focus on local search they have to do better. For example, by giving more importance to sites who are set to a country in Google Webmaster Tools.

  7. “Google does seem to want to show more local web results in response to queries in Web search, in addition to Google Maps results that may also be included with those results.” This is great news for us local SEO’ers. ;) But in all seriousness this does make sense. As a searcher I’d want to get more local results for searches or at least some sprinkled in.

  8. Google seems to go to more local search results. I must say I can only cheer at this. Competing for a good searp with non relevant sites is outdated.
    When someone search from the Netherlands they want samples for the Netherlands first or be able to filter.

    We will see soon enough how this will change our daily job.

  9. What will happen if all google data centers are destroyed?

    Like, I presume that the data we have in our g-mail, is stored in some big data centers owned by google. What will happen to all this data if some worldwide attack by competitors or terrorists destroys all/majority of these? How well is this data protected?

    I just thought that it would be quite impossible to do almost anything if all the data from g-mail got destroyed…

  10. I’m seeing some changes in the last few days on Google rankings for searches for local businesses in my area. I’m wondering if perhaps the Venice update has hit Canada?

    Thanks for the interesting article. I had no idea this local search thing was so complicated!

  11. Well one thing is for sure – it is a full time job keeping up with all of the changes at Google and now Facebook. Although I must say I really enjoy doing it.

  12. Hi Frédéric,

    A translation (from Google) of your comment:

    The increasing share attributed to localized results from Google (as a business card or Google Maps to display a “mix” website and given localized) stresses the need for a company to optimize its referencing local (to provided that this type of referencing also responds very well to the increasing use of mobile phones).

    When providing information about your location or locations is essential to the objectives of your business, because you have locations that people can visit or because you provide a service within certain areas, it is important to send some clear signals to the search engines about where those locations actually are.

    This can mean using schema.org meta data, including geographic data in key/value formats (phone: 555-555-1212), using intelligent choices of anchor text in links to pages that include contact information or directions or other pages that a search engine might perceive as being strong indications of location, and more.

  13. Hi Yousaf,

    I do think it’s a good idea for Google Webmaster Tools to enable webmasters to indication the locations that they would ideally like to target. I’m not sure how much that information is being used in the methods described in these geographic based patents.

    For instance, if a particular website gets lots of visits from people in a particular region, even if it’s outside of that region, but the site isn’t popular on a more global level, it might be considered a “regional” result and included in the regional database on a particular data center. From this patent it looks like that analysis is based upon query and user behavior in a particular area associated with a particular data center, rather than signals like what a webmaster might be targeting in Webmaster Tools.

  14. Hi Orli,

    There’s been a lot of speculation about Venice that confuses what it actually is. It appears to be something that adds more Google Maps results when it seems like those might be useful, possibly by doing things like looking at the search results that appear for a particular query (not the documents themselves, but just what appears in the search result – the titles and descriptions and URLs), and seeing if they do things like mention specific places. If they do, that might be a sign to include more maps results.

    Another of the geographical changes mentioned in that update (the last one listed, at # 40), mentioned that Google might insert more local-based organic results than it had been in the past. So, if you perform a search using a somewhat broad query term in Ft. Pierce, Florida, (with your location set at that location), some of the results you see might be localized to that area, and if you change your location to a town in Indiana, you might see a mix of more “global” results as well as a different set of localized results.

    I don’t think that we can make generalizations regarding how localized queries might be displayed in one area and how they might be displayed differently in another area. This seems to be based upon a fairly complicated statistical geographic model that takes into consideration a fairly wide range of both data points and user behavior data. Optimize for some broader areas, and for some smaller areas, and test. And test some more. And then even more.

  15. Hi Andy,

    Your article appears to be more about the localized organic results update that Google has been experimenting with for a few years, which probably has some kind of tie-in with this separate regional-global databases at different data centers than it does with the “Venice” update, which only involves whether or not maps results are shown for a certain query.

    I make that distinction not because I’m obsessive-compulsive or fixiated on precise language, but rather because I’m afraid that if we refer to one with the name of the other, it’s going to get messy when we try to understand what might be going on.

    I’ve been seeing these kinds of localized results inserted into results for somewhat broad queries since at least 2009, and they’ve disappeared for long periods to reappear later. It seems like with the announcement about them at the Google Inside Search blog, that the search engine has turned them up.

    I do think that the signals that might make a page a good candidate to appear as one of these localized results aren’t all that complex, and might be very similar to those that I discussed in this post:

    10 Most Important SEO Patents: Part 8 – Assigning Geographic Relevance to Web Pages

    I’ve been fortunate to have a few sites seem some significant traffic based upon these types of local results, and remember pretty clearly when one site started ranking at # 4 for a very broad, but very relevant query term back in 2009. I noticed that it was only ranking well for that term in searches where a person’s location was set near that area, and when I changed my location, I would see similar “local” results within that # 4 slot that were near my location.

    At that point in time, it only seemed like Google was showing one of these “localized” organic results to searchers, and these recent update includes a number of additional localized results within the top ten results for queries. The behavior looks very similar though.

  16. Hi Sean,

    It’s good that you have a chance of your pages not only showing up in Google Maps type results, but also localized organic results as well, and I think that’s an improvement to what Google has been doing.

    It’s likely that a number of the signals that Google looks at to determine of a site is local to a specific area might mimic to a degree some of the analysis that might be used by Google Maps to associate a business and a web site to a particular address. If you follow my link in the comment above, there are likely some other signals that Google might look at as well.

    I don’t think that using schema.org type of contact information on your pages hurts you either, and can make it easier for Google to pinpoint your location.

    But regardless of how well Google might know your address, a determination of whether your site is a regional one or a global one is an independent one that might be based upon a number of signals outside of this analysis, such as the many different places around the earth that you receive traffic and links and clicks from. Query and user behavior seem to make up a substantial amount of the analysis that might determine that. And I think if your are considered a “regional” site, you have a better chance of being inserted into one of those slots.

    The question really though, is whether it’s better to be considered regional and possibly be placed into a localized organic slot high in search results, or to be considered global and rank well anyway. If the site and business in question is regional in nature, it might be better to be placed into one of those slots. But Google has turned something like this on in the past, and then turned it off for a long while. There’s no guarantee that they might not do that again.

  17. Hi Ruben,

    I suspect that Google is more likely to take the site targeting tool in GWT as a suggestion by webmasters, and look at many other data points as well, including user data and how people at different regions might interact with a site, especially one that might not really be “local” to them.

    For instance, if you set your location to USA in webmaster tools, but your site prominently indicates locations in Canada and Mexico and South America, and it gets lots of visits from those places, which signals should Google give more weight to?

  18. Hi John,

    In my past experiences with Google inserting a more limited set of localized organic results, it is a pretty positive change. It’s great to see them doing more of it.

  19. Hi Dushyant,

    Google has been doing what they can to build more and more data centers, and to improve the technology and data backup systems for those data centers. There’s always been a strong element of how they protect data and include redundant systems to ward off data errors and loss.

    Some of the recent patent acquisitions that Google has made from companies like IBM involve networking systems that include processes for backing up data as well, such as some of the backup tape technologies that IBM pioneered.

    The “global” data that the patent describes as being in separate databases from regional data are replicated across many different data centers as well, and Google might send you to a different data center to provide results if there is some kind of problem at a data center near you. But that doesn’t mean that Google doesn’t have emergency plans and backup capabilities if problems might strike and potentially cause losses to data in a regional database at a local data center.

  20. Hi David,

    Yes, the localized results do seem to be spreading. Not sure if they started making this change in the UK first, or if people there were just more observant than some of the rest of us or more vocal about it.

    I think in some ways, all of SEO is getting increasingly more complicated.

  21. This is actually a really great idea. Its essentially a larger google local and you could probably argue that search results from each independent country for the same search terms varies even at a smaller level than their data centers. When processing power becomes easy enough to scale I can imagine each individual searcher having their own search algorithm constructed around their past searching history and clickthroughs. Your own custom google search results.

  22. There seems no doubt that local results are becoming more pervasive. It just makes sense that if someone punches ‘plumber’ into Google that they get a list of the closest 10 rather than the 10 biggest plumbing suppliers in the nation. Same for all local services. This allows Google to become more relevant for users and return better information. Plus expand returns pages. Bill’s very thorough analysis shows one more step by Google to make returns more localised to the searcher. Interesting to test Chinese characters in a Virginia search. Hard for Google to get inside the mind of the searcher on this and work out what they intend to search. We work in web design and for a brief period about 2 years ago the first page was dominated by local returns but this was removed and we have gone back to the traditional view. I’m assuming Google has decided website design isnt a local service. Not sure what contributes to this decision in the algorithm.

  23. Hi Chris,

    It is a pretty interesting idea. Google does provide personalization in search results based upon things like your past search history and web history, but often that type of information is somewhat sparse. In that case, it’s possible that Google might do personalization based upon searching and browsing histories of “people like you.”

    Google is pretty concerned about not only the efficiency of access of individual to data that might be relevant to them, but also about how much data should be shared at every data center as global, and data that doesn’t need to be transmitted to every data center because it’s regional in nature.

  24. Hi jlawrence,

    It definitely seems to be one of Google’s aims to show more local results when there may be an intent from searchers involving location.

    For some categories, like SEO, Google hasn’t shown map type results and I though that was true for Web design companies as well. I suspect that people searching for those types of services do consider location to be important, so I’m not sure why that particular decision was made.

  25. Hi

    I wanted to know why my website does not show up in google.com but it shows up in other google centers such as google.de in the 11th page for a very competitive keyword.

    My preferred location in google webmaster tools is set at USA.

    Please let me know. Thanks

  26. The best think about Google’s local search, is that it helps local businesses rank well locally. For example if you make a simple query like ” Flower Delivery ” using a local Google engine, they will suggest ”local flower delivering service ” instead of showing data from other regions as it might be on their .Com engine. In this case their localized engine is more like a local directory.

  27. Determining local versus global seems kind of tricky. You can’t use the IP of the server alone, so is there XML data that can be included in the page code which lists the address, state and city of the company?

    Thanks in advance,
    -Tony

Comments are closed.