Last July, a Google Blog post titled More Wood Behind Fewer Arrows announced the closing of Google Labs, where a number of experimental projects taking place at Google were available for the public to explore and try out. Many of those projects sprouted out of Google’s 20 percent time approach, where engineers are encouraged to spend one day a week, or 20 percent of their time, working on projects that aren’t necessarily part of their job description. Amongst those projects starting out as 20 percent time projects are Gmail, Adsense for content, Orkut, and Google Suggest. We’ve been told that the 20 percent initiative isn’t going away, but Google seems to be growing a little more secretive.
When Eric Schmidt stepped down as CEO of Google, and Larry Page took over that role, Co-Founder Sergey Brin’s position of the company was redefined as well, and we were told that he would be in charge of “special projects” at Google. A New York Times article published in November of last year told us about Google’s Lab of Wildest Dreams or a “top-secret lab in an undisclosed Bay Area location where robots run free,” referred to as Google X. This is the home of Google’s Driverless cars. It’s a place where “shoot for the stars” type technology is being explored.
It might also now be the home to a project that has roots in a technology essential to the laying of the transatlantic cable back in the 1860s, developed by Oliver Heavyside.
Has an improvement in how Google understands the layout of pages, and understands and classifies different elements found on page had an impact on the titles and snippets that we see in search results? Google may classify queries to decide what to show for those page titles and snippets in search results, but it’s possible that they might also be classifying the contents of “original titles and snippets and URLs” when deciding to show different titles and expanded snippets. Might Google do that in combination with a classification of page elements (a portion of HTML containing some text) found on the pages in search results to try to determine the best representation of a search result in response to a query?
Google May Chose Titles and Snippets for Pages
When you search at Google, the search results displayed for web pages include titles, URLs, and snippets for the pages listed in the results. In those, the query terms you used, or sometimes synonyms for them, may be included in the title and snippet, and Google will highlight those. As a site owner, you should have unique and engaging titles and meta descriptions for each page you want indexed by search engines. Not only does that make it more likely that search engines will crawl, index, and display those pages, but if you use the keywords you’re optimizing those pages for within those titles and descriptions, Google may show your choice of title and meta description within search results.
Somewhere in an alternative universe, it’s possible that one of the most feared hitters in baseball might have instead been known as one of its greatest pitchers. Babe Ruth started out as a pitcher for the Boston Red Sox in 1914, and when approached about getting his bat into the lineup on a daily basis in 1918, his manager Ed Barrow responded that “I’d be the laughingstock of baseball if I took the best lefthander in the league and put him in the outfield.” A couple of years later, Ruth was sold to New York’s team for an unprecedented $125,000 where he proceeded to hit 54 home runs for the Yankees, and begin a pretty good career hitting a baseball instead of throwing it at people.
In 1920, anyone looking for information about the Babe probably weren’t too interested in his pitching career. Likewise, when someone searches today for [world series champion], it’s likely that they are looking for fresh results. How does a search engine like Google determine when searchers might prefer fresh results, and when they might prefer older results?
A patent application was published today which describes the kind of intelligent automated assistant that we see in use on Apple’s iPhone 4S, known as Siri. But the patent isn’t necessarily limited to the iPhone application itself, and the describes how such a system could be used in a number of ways, including with mobile phones, PDAs, tablets, game consoles, embedded computer systems in cars, and possibly others. This assistant might provide information and services on a single client device or multiple devices, and possibly in combination with applications and information on servers as well.
It could also act as an active participant in messaging platforms such as email, instant messaging, discussion forums, group chat sessions, and customer support sessions.
Yesterday, I wrote about how Google may be looking at the semantics associated with HTML heading elements, and the content that they head, and how the search engine might be looking at such content with similar headings across the Web to determine how much weight to give words and phrases within those headings.
That post was originally part of the introduction to this post, but it developed a life of its own, and I ran with it. Here, we’re going to look at semantics related to other HTML structures, primarily lists and tables.
I’m going to bundle a handful of patents together for this choice of one of the 10 most important SEO patents, since I think they work together to illustrate how a search engine might use semantic structures to learn about how words and concepts might be related to each other on the Web. Some of these patents are older, and one of them is a pending patent application published this week. I’m also going to include a number of white papers which help define a process that might seem to be very much behind the scenes at Google. I’m going to focus upon Google with this post, though expect that similar things may also be happening at other search engines as well.
How important are heading elements to the rankings of webpages by search engines?
I’ve seen arguments by people who write about and study search engines and SEO very closely, which often appear written up in “SEO Expert Ranking Lists,” that HTML heading elements (<h1>, <h2>, etc.) are very important, arguments that heading elements were once important and are no longer, and arguments that heading elements were never important. Sadly, all of those arguments are likely wrong. Not so much about the importance or lack of, but rather about the reasons for that importance.
It’s possible that a search engine might notice when a word or term or phrase appears near the top of a page, or above a wall of text. It’s also possible that a search engine pays attention when those are shown in larger font sizes, or bolder than the rest of the page text, or in a different font than the remainder of the words on the page. But that prominence and that display isn’t really what a heading element is about. HTML has a font size large attribute and property. There’s also a bold property. Any words on a page near the top of that page might be said to be more prominent than others.
You can use many HTML element attributes and values and/or cascading style sheet properties to make words within different HTML elements bolder and larger, and to transform them to all capitals or a different font or color, or all of those if you want. You can purposefully place certain text at the top of a page to make it appear that the rest of the page is described by those words.
In the last installment of this series, we looked at how Google may be using phrase based indexing to use the fact that many phrases often tend to co-occur with other phrases within the content of web pages, to re-rank those pages. When we look at phrases, we also need to drill down to a special set of phrases describing named entities, or specific people, places, or things. In addition to trying to understand which phrases might tend to co-occur with those named entities, the search engines may look to other sources such as Wikipedia, Freebase from Metaweb, the Internet Movie Database (IMDB), and different map databases to attempt to understand when a phrase indicates an actual (or fictional) entity.
Google, Bing, and Yahoo all look for named entities on web pages and in search queries, and will use their recognition of named entities to do things like answer questions such as “where was Barack Obama born?”
Looks like Google and IBM are working together again to build up Google’s patent portfolio, from an update at the United States Patent and Trademark Office (USPTO) patent assignment database. Details beyond the actual patents involved aren’t known yet. The last couple of times I wrote about large patent transactions between Google and IBM this past July and September, Google ended up sending out emails a few hours after my posts to a number of large media sites such as the New York Times, Bloomberg News, the Wall Street Journal, and a number of others disclosing the acquisitions. We’ll see if they do that again.
The last week of 2011, Google acquired 188 granted patents and 29 published pending patent applications from IBM, according to the USPTO assignment database, with an execution data on the assignment of the patents on December 28, 2011, in a deal that was officially recorded at the patent office on December 30, 2011.
The patents cover a broad range of topics, such as presentation software, blade servers, data caching, server load balancing, network performance, video conferencing, email administration, and instant messaging applications. A number of the patents cover specific internet, phone, and mobile phone technologies as well.