Rand noted that first page rankings for three different pages, which didn’t seem very much optimized for the queries they were returned for, might be ranked based upon a ranking signal that looks at how words tend to co-occur on pages related to those queries. My post in response explored some reranking approaches by Google that also might account for those rankings, including Phrase Based Indexing, Google’s Reasonable Surfer Model, Named Entity Associations, Category associations involving categories assigned to queries and categories assigned to webpages, and Google’s use of synonyms in place of terms within queries.
Google’s Phrase-Based Indexing approach pays a lot of attention to words (phrases, actually) that appear together, or co-occur, in the top (10/100/1,000) search results for a query and may boost pages in rankings based upon that co-occurrence, and seemed like a possible reason why those pages might be appearing on the first page of results. The other reranking approaches that I included also seemed like they might be in part or in full responsible for the rankings as well. Then I found a patent granted to Google this week that seems like an even better fit.
Last Friday, in a well received and thoughtful White Board Friday at SEOmoz titled Prediction: Anchor Text is Dying…And Will Be Replaced by Co-citation (title changed at SEOmoz) Prediction: Anchor Text is Weakening…And May Be Replaced by Co-Occurrence, Rand Fishkin described how some unusual Search Results caused him to question how Google was ranking some results.
I’m a big fan of looking at and trying to analyze and understand search results for specific queries, especially when they include results that appear somewhat puzzling, and I think those provide some great fodder for discussions about how Google might be ranking some search results. Thanks, Rand.
If I were to tell you that the major search engines have a bigger and richer database full of information than their index of the World Wide Web, would you believe me? Chances are that you’re one of the persons who helped build it. The information that Google and Bing and Yahoo collect about the searches and query sessions and clicks that searchers perform on the Web covers an incredible number of searches a day. When Google introduced their Knowledge Graph this past May, they gave us a hint of the scope and usage of this database:
For example, the information we show for Tom Cruise answers 37 percent of next queries that people ask about him. In fact, some of the most serendipitous discoveries I’ve made using the Knowledge Graph are through the magical “People also search for” feature.
When someone performs a search for a query that doesn’t produce much results at Google or Bing, the search engines might remove some of the query terms to provide more results, or they might look for synonyms that might help fill the same or a similar informational need. But chances are that such approaches still might not produce the kinds of results that searchers want to see.
Can social networking rankings influence which users profiles and interactions get crawled and then indexed first by a search engine crawling program? A Microsoft patent application asks and answers that question. Is it something that Bing is using, or will use?
Importance Metrics for Prioritizing Crawls
Back in the early days of Google, PageRank wasn’t just a way of ranking pages based upon the quality and quantity of links pointed to your pages. Google also used PageRank as one of the importance metrics used to decide which pages to prioritize when they had to choose which URLs to crawl first. The paper, Efficient Crawling Through URL Ordering (pdf), co-authored by Google Founder Lawrence Page pointed to a few other metrics that were used to decide which URLs to visit first on a crawl, including PageRank. Another of those looked at how close a page is to the root directory of a site. The idea behind that one is that it’s better to index a million different home pages than it is to index a million pages on one site.
With the growth of social networks and an incredible amount of user generated content that comes with them, there’s a lot less reliance upon links, and yet search engines want to crawl and index as much content from those types of sites as well. The lack of links to those means that something like PageRank is out of the question – and probably would be if we were talking about Google, too. Search engines don’t just want to crawl and then index user profiles, but also the things users of those networks post and the conversations that they have. Why not focus upon crawling content from people who are more active on those social networks?
Social networking content should be relevant and recent when shown in search results. But the ranking of that social content is an area that fairly new to social networks, and something that there’s really no established methods for. A search engine can grab a crawl list from a social network, with the URLs of pages and posts and pictures to crawl, but where should it start? Such a crawl list can even be easy to retrieve, especially in cases like when a social network like Twitter might turn over an XML feed to a search engine. But again, where to begin?
Can the quality of links that your pages or videos or other documents link to influence the ranking of your pages, based upon a reachability score? A newly granted patent from Google describes how the search engine might look at linked documents and other resources reachable from a page or video or image to determine such a reachability score.
Search rankings might be promoted (boosted) or demoted in search results for a query based upon that reachability score calculated based upon a number of different factors.
Someone clicks on a search result, and while there they find links to other resources that they might click upon. Different user behaviors recorded by a search engine might be monitored to determine how people interact with the first, or primary resource visited, and similar user behavior signals may also be looked at for pages or videos or other resources linked to from that resource. Reachability scores might also be calculated for those secondary resources linked to from the first resource, looking at the third or tertiary pages and other resources linked to from the secondary resources.
Calculating reachability scores may follow a process like the following:
Did Google sidestep a lawsuit with an acquisition of patents involving electronic phone payments?
One initiative that Google has been hard at work on is making it easy for people to make payments electronically by phone. The Google Wallet has been available as an Android app on some phones, and it looks like it’s been moving beyond the need to use near field communications (NFC) to make payments.
Last year, on September 8, 2011, E-Micro Corporation filed a patent infringement lawsuit against a group of defendents, including: Google, Inc., Samsung Electronics Co., Ltd., Samsung Electronics America, Inc., Samsung Telecommunications America, L.L.C., Sprint Nextel Corporation, Sprint Spectrum L.P., Nextel Operations, Inc., Sprint Solutions, Inc., Amazon.com, Inc., Best Buy Co., Inc. and BBY Solutions, Inc.
Imagine that a search engine might insert place markers into a web page, perhaps with the use of something like the new Google Tag Manager? These markers could enable a search engine to calculate how long it might take someone to read that page. A newly granted patent from Google describes why they might insert such markers (without really telling how how it might insert those), to determine the reading speed of a page.
The process described by the patent might try to understand how different features associated with a page might cause it to take less time or more time for a visitor to read a page. It would then use that understanding to predict how such features might influence the reading of other pages that don’t have markers inserted into them. These types of features could include language, layout, topic, and the length of text of those documents. These are all things that could affect traffic across the web or at specific websites.
Some days Google seems like it’s more of a science fiction factory than a search engine, developing products like driverless cars, and augmented reality glasses. An academic project at Berkeley adds another element to the mix – Robots. Robots that can help pick up commonplace objects around your home, and put them in their proper places.
A paper submitted to the IEEE International Conference on Robotics and Automation, to be held in Karlsruhe, Germany on May, 2013, describes the role that Googles visual search queries plays in helping robots understand the objects that they might try to pick up, before they do. In Cloud-Based Robot Grasping with the Google Object Recognition Engine, we’re told about cloud-based robots that can view objects, and send queries about them to version of Google Goggles on the cloud to learn more about those objects and the best way to grasp them.
Google Goggle’s is Google’s visual search app, which enables you to take photographs and send them to Google to potentially perform facial recognition searches, OCR searches for text in images, product and bar code recognition, recognizing landmarks and places and named entities, and more. I spent a few hours at my Mom and Dad’s house a couple of weekends ago taking pictures of almost every photo and painting they had on their walls, and seeing if Google Goggles recognized any of them.
Another feature that the visual search engine is capable of is recognizing objects, and the Berkeley team, with the assistance of James Kuffner of Google, appears to have achieved a goal that had eluded them in the past with the use of Google Goggles. From the paper’s introduction: