When Google ranks businesses at locations in Google Maps, they turn to a number of sources to find mentions of the name of the business coupled with some location data. They can look at the information that a site owner might have provided when verifying their business with Google and Bing and Yahoo. They may look at sources that include business location information such as telecom directories like superpages.com or yellowpages.com. or business location databases such as Localeze. They likely also look at the website for the business itself, as well as other websites that might include the name of the business and some location data for the business, too.
What happens when the information from those sources doesn’t match. Even worse, what happens when one of these sources includes information that might be on the spammy side? A patent granted to Google this week describes a way that Google might use to police for such places. The patent warns against titles for business entities that include terms such as “cheap hotels,” “discounts,” Dr. ABC–555 777 8888.” It also might identify spam in categories for businesses that might include things such as “City X,” “sale,” “City A B C D,” “Hotel X in City Y,” and “Luxury Hotel in City Y.”
In the context of a business entity, information that skews the identity of or does not accurately represent the business entity or both is considered spam.
I sometimes see people say that paid search is a great way to do keyword research for SEO, but I disagree with that statement. Paid search primarily focuses upon keywords that are transactional in nature – usually the terms chosen are the kind that match an intent to buy something, download something, or take some other kind of action. I’ve asked many people who do search engine advertising, and focus on Adwords if they ever target queries that are informational in nature, and most of the time the answer has been no.
Often searchers will do some research on a product or service before they decide who to buy from. They will perform research to find what kinds of features are available for different products, try to find reviews or opinions from others, They may try to compare different manufacturers as well. These types of queries are more informational in nature, and the same searcher will conduct these types queries that evidence an informational intent before they begin to consider a query with a transactional intent.
Rand noted that first page rankings for three different pages, which didn’t seem very much optimized for the queries they were returned for, might be ranked based upon a ranking signal that looks at how words tend to co-occur on pages related to those queries. My post in response explored some reranking approaches by Google that also might account for those rankings, including Phrase Based Indexing, Google’s Reasonable Surfer Model, Named Entity Associations, Category associations involving categories assigned to queries and categories assigned to webpages, and Google’s use of synonyms in place of terms within queries.
Google’s Phrase-Based Indexing approach pays a lot of attention to words (phrases, actually) that appear together, or co-occur, in the top (10/100/1,000) search results for a query and may boost pages in rankings based upon that co-occurrence, and seemed like a possible reason why those pages might be appearing on the first page of results. The other reranking approaches that I included also seemed like they might be in part or in full responsible for the rankings as well. Then I found a patent granted to Google this week that seems like an even better fit.
Last Friday, in a well received and thoughtful White Board Friday at SEOmoz titled Prediction: Anchor Text is Dying…And Will Be Replaced by Co-citation (title changed at SEOmoz) Prediction: Anchor Text is Weakening…And May Be Replaced by Co-Occurrence, Rand Fishkin described how some unusual Search Results caused him to question how Google was ranking some results.
I’m a big fan of looking at and trying to analyze and understand search results for specific queries, especially when they include results that appear somewhat puzzling, and I think those provide some great fodder for discussions about how Google might be ranking some search results. Thanks, Rand.
If I were to tell you that the major search engines have a bigger and richer database full of information than their index of the World Wide Web, would you believe me? Chances are that you’re one of the persons who helped build it. The information that Google and Bing and Yahoo collect about the searches and query sessions and clicks that searchers perform on the Web covers an incredible number of searches a day. When Google introduced their Knowledge Graph this past May, they gave us a hint of the scope and usage of this database:
For example, the information we show for Tom Cruise answers 37 percent of next queries that people ask about him. In fact, some of the most serendipitous discoveries I’ve made using the Knowledge Graph are through the magical “People also search for” feature.
When someone performs a search for a query that doesn’t produce much results at Google or Bing, the search engines might remove some of the query terms to provide more results, or they might look for synonyms that might help fill the same or a similar informational need. But chances are that such approaches still might not produce the kinds of results that searchers want to see.
Can social networking rankings influence which users profiles and interactions get crawled and then indexed first by a search engine crawling program? A Microsoft patent application asks and answers that question. Is it something that Bing is using, or will use?
Importance Metrics for Prioritizing Crawls
Back in the early days of Google, PageRank wasn’t just a way of ranking pages based upon the quality and quantity of links pointed to your pages. Google also used PageRank as one of the importance metrics used to decide which pages to prioritize when they had to choose which URLs to crawl first. The paper, Efficient Crawling Through URL Ordering (pdf), co-authored by Google Founder Lawrence Page pointed to a few other metrics that were used to decide which URLs to visit first on a crawl, including PageRank. Another of those looked at how close a page is to the root directory of a site. The idea behind that one is that it’s better to index a million different home pages than it is to index a million pages on one site.
With the growth of social networks and an incredible amount of user generated content that comes with them, there’s a lot less reliance upon links, and yet search engines want to crawl and index as much content from those types of sites as well. The lack of links to those means that something like PageRank is out of the question – and probably would be if we were talking about Google, too. Search engines don’t just want to crawl and then index user profiles, but also the things users of those networks post and the conversations that they have. Why not focus upon crawling content from people who are more active on those social networks?
Social networking content should be relevant and recent when shown in search results. But the ranking of that social content is an area that fairly new to social networks, and something that there’s really no established methods for. A search engine can grab a crawl list from a social network, with the URLs of pages and posts and pictures to crawl, but where should it start? Such a crawl list can even be easy to retrieve, especially in cases like when a social network like Twitter might turn over an XML feed to a search engine. But again, where to begin?
Can the quality of links that your pages or videos or other documents link to influence the ranking of your pages, based upon a reachability score? A newly granted patent from Google describes how the search engine might look at linked documents and other resources reachable from a page or video or image to determine such a reachability score.
Search rankings might be promoted (boosted) or demoted in search results for a query based upon that reachability score calculated based upon a number of different factors.
Someone clicks on a search result, and while there they find links to other resources that they might click upon. Different user behaviors recorded by a search engine might be monitored to determine how people interact with the first, or primary resource visited, and similar user behavior signals may also be looked at for pages or videos or other resources linked to from that resource. Reachability scores might also be calculated for those secondary resources linked to from the first resource, looking at the third or tertiary pages and other resources linked to from the secondary resources.
Calculating reachability scores may follow a process like the following:
Did Google sidestep a lawsuit with an acquisition of patents involving electronic phone payments?
One initiative that Google has been hard at work on is making it easy for people to make payments electronically by phone. The Google Wallet has been available as an Android app on some phones, and it looks like it’s been moving beyond the need to use near field communications (NFC) to make payments.
Last year, on September 8, 2011, E-Micro Corporation filed a patent infringement lawsuit against a group of defendents, including: Google, Inc., Samsung Electronics Co., Ltd., Samsung Electronics America, Inc., Samsung Telecommunications America, L.L.C., Sprint Nextel Corporation, Sprint Spectrum L.P., Nextel Operations, Inc., Sprint Solutions, Inc., Amazon.com, Inc., Best Buy Co., Inc. and BBY Solutions, Inc.