If I were to tell you that the major search engines have a bigger and richer database full of information than their index of the World Wide Web, would you believe me? Chances are that you’re one of the persons who helped build it. The information that Google and Bing and Yahoo collect about the searches and query sessions and clicks that searchers perform on the Web covers an incredible number of searches a day. When Google introduced their Knowledge Graph this past May, they gave us a hint of the scope and usage of this database:
For example, the information we show for Tom Cruise answers 37 percent of next queries that people ask about him. In fact, some of the most serendipitous discoveries I’ve made using the Knowledge Graph are through the magical “People also search for” feature.
When someone performs a search for a query that doesn’t produce much results at Google or Bing, the search engines might remove some of the query terms to provide more results, or they might look for synonyms that might help fill the same or a similar informational need. But chances are that such approaches still might not produce the kinds of results that searchers want to see.
Can social networking rankings influence which users profiles and interactions get crawled and then indexed first by a search engine crawling program? A Microsoft patent application asks and answers that question. Is it something that Bing is using, or will use?
Importance Metrics for Prioritizing Crawls
Back in the early days of Google, PageRank wasn’t just a way of ranking pages based upon the quality and quantity of links pointed to your pages. Google also used PageRank as one of the importance metrics used to decide which pages to prioritize when they had to choose which URLs to crawl first. The paper, Efficient Crawling Through URL Ordering (pdf), co-authored by Google Founder Lawrence Page pointed to a few other metrics that were used to decide which URLs to visit first on a crawl, including PageRank. Another of those looked at how close a page is to the root directory of a site. The idea behind that one is that it’s better to index a million different home pages than it is to index a million pages on one site.
With the growth of social networks and an incredible amount of user generated content that comes with them, there’s a lot less reliance upon links, and yet search engines want to crawl and index as much content from those types of sites as well. The lack of links to those means that something like PageRank is out of the question – and probably would be if we were talking about Google, too. Search engines don’t just want to crawl and then index user profiles, but also the things users of those networks post and the conversations that they have. Why not focus upon crawling content from people who are more active on those social networks?
Social networking content should be relevant and recent when shown in search results. But the ranking of that social content is an area that fairly new to social networks, and something that there’s really no established methods for. A search engine can grab a crawl list from a social network, with the URLs of pages and posts and pictures to crawl, but where should it start? Such a crawl list can even be easy to retrieve, especially in cases like when a social network like Twitter might turn over an XML feed to a search engine. But again, where to begin?
Can the quality of links that your pages or videos or other documents link to influence the ranking of your pages, based upon a reachability score? A newly granted patent from Google describes how the search engine might look at linked documents and other resources reachable from a page or video or image to determine such a reachability score.
Search rankings might be promoted (boosted) or demoted in search results for a query based upon that reachability score calculated based upon a number of different factors.
Someone clicks on a search result, and while there they find links to other resources that they might click upon. Different user behaviors recorded by a search engine might be monitored to determine how people interact with the first, or primary resource visited, and similar user behavior signals may also be looked at for pages or videos or other resources linked to from that resource. Reachability scores might also be calculated for those secondary resources linked to from the first resource, looking at the third or tertiary pages and other resources linked to from the secondary resources.
Calculating reachability scores may follow a process like the following:
Did Google sidestep a lawsuit with an acquisition of patents involving electronic phone payments?
One initiative that Google has been hard at work on is making it easy for people to make payments electronically by phone. The Google Wallet has been available as an Android app on some phones, and it looks like it’s been moving beyond the need to use near field communications (NFC) to make payments.
Last year, on September 8, 2011, E-Micro Corporation filed a patent infringement lawsuit against a group of defendents, including: Google, Inc., Samsung Electronics Co., Ltd., Samsung Electronics America, Inc., Samsung Telecommunications America, L.L.C., Sprint Nextel Corporation, Sprint Spectrum L.P., Nextel Operations, Inc., Sprint Solutions, Inc., Amazon.com, Inc., Best Buy Co., Inc. and BBY Solutions, Inc.
Imagine that a search engine might insert place markers into a web page, perhaps with the use of something like the new Google Tag Manager? These markers could enable a search engine to calculate how long it might take someone to read that page. A newly granted patent from Google describes why they might insert such markers (without really telling how how it might insert those), to determine the reading speed of a page.
The process described by the patent might try to understand how different features associated with a page might cause it to take less time or more time for a visitor to read a page. It would then use that understanding to predict how such features might influence the reading of other pages that don’t have markers inserted into them. These types of features could include language, layout, topic, and the length of text of those documents. These are all things that could affect traffic across the web or at specific websites.
Some days Google seems like it’s more of a science fiction factory than a search engine, developing products like driverless cars, and augmented reality glasses. An academic project at Berkeley adds another element to the mix – Robots. Robots that can help pick up commonplace objects around your home, and put them in their proper places.
A paper submitted to the IEEE International Conference on Robotics and Automation, to be held in Karlsruhe, Germany on May, 2013, describes the role that Googles visual search queries plays in helping robots understand the objects that they might try to pick up, before they do. In Cloud-Based Robot Grasping with the Google Object Recognition Engine, we’re told about cloud-based robots that can view objects, and send queries about them to version of Google Goggles on the cloud to learn more about those objects and the best way to grasp them.
Google Goggle’s is Google’s visual search app, which enables you to take photographs and send them to Google to potentially perform facial recognition searches, OCR searches for text in images, product and bar code recognition, recognizing landmarks and places and named entities, and more. I spent a few hours at my Mom and Dad’s house a couple of weekends ago taking pictures of almost every photo and painting they had on their walls, and seeing if Google Goggles recognized any of them.
Another feature that the visual search engine is capable of is recognizing objects, and the Berkeley team, with the assistance of James Kuffner of Google, appears to have achieved a goal that had eluded them in the past with the use of Google Goggles. From the paper’s introduction:
Google’s local search may be getting smarter one streetview scene at a time. A few years back, I jokingly made a robots.txt sign for my front door that had the following statement in it:
In the root level directory of a website, a robots.txt file containing those two lines would tell Google’s page crawling program not to index any pages from the site. On the front of a home in my small town, it might have gotten some odd looks, but that’s about it. I had expected at some point that Google would send a streetview car or two down my street, and I would have been able to write a blog post with a streetviews image of the front of my house with a title along the lines of “Google Ignores Robots.txt File: Indexes My House.” I ended up not leaving the sign up, but I’m second guessing that now that I know streetviews cars can read.
That really shouldn’t have been a surprise back then. I wrote a post in 2007 titled Better Business Location Search using OCR with Street Views which described how Google might use OCR to gather information from signs it takes video of for street views. The patent filing I wrote about really didn’t discuss how that information might be used, but it presented the possibility of its use. I suspect my real life robots.txt file would have been ignored back then, though the drivers of those cars had learned at that point that signs like “Private Street” and “Military Base,” were areas they couldn’t film.
Google was granted a patent last week that gives us a look at how information from street level signs might be collected and indexed by Google, and compared to online information about the same locations to try to “calibrate” and “score” any information about the places being listed in Google’s index. Here’s an image from the patent that shows at a glance the kinds of information it might attempt to read:
I’m on the second day of a trip to New York City, giving presentations at SMX East on both the potential impact of mobile devices to the future of search, and on how reputation and authority signals might impact the rankings and visibility of authors and publishers and commentors on the Web.
My first presentation was in the “local and mobile” mobile track of the conference as part of a session titled “Meet Siri: Apple’s Google Killer?” where I joined Bryson Meunier, Will Scott, Andrew Shotland, and moderator Greg Sterling in discussing the potential impact of Apple’s Siri and voice search on SEO and search.
When I read the title for this proposed session a couple of months back, I couldn’t help but start to draft a pitch to join in on the conversation. I’ve been carefully watching patents and papers from Google and Apple and others about inventions and interfaces that might transform the way we search in the future, and the way that people might share information and market businesses online.