Yesterday, I wrote about how Google may be looking at the semantics associated with HTML heading elements, and the content that they head, and how the search engine might be looking at such content with similar headings across the Web to determine how much weight to give words and phrases within those headings.
That post was originally part of the introduction to this post, but it developed a life of its own, and I ran with it. Here, we’re going to look at semantics related to other HTML structures, primarily lists and tables.
I’m going to bundle a handful of patents together for this choice of one of the 10 most important SEO patents, since I think they work together to illustrate how a search engine might use semantic structures to learn about how words and concepts might be related to each other on the Web. Some of these patents are older, and one of them is a pending patent application published this week. I’m also going to include a number of white papers which help define a process that might seem to be very much behind the scenes at Google. I’m going to focus upon Google with this post, though expect that similar things may also be happening at other search engines as well.
How important are heading elements to the rankings of webpages by search engines?
I’ve seen arguments by people who write about and study search engines and SEO very closely, which often appear written up in “SEO Expert Ranking Lists,” that HTML heading elements (<h1>, <h2>, etc.) are very important, arguments that heading elements were once important and are no longer, and arguments that heading elements were never important. Sadly, all of those arguments are likely wrong. Not so much about the importance or lack of, but rather about the reasons for that importance.
It’s possible that a search engine might notice when a word or term or phrase appears near the top of a page, or above a wall of text. It’s also possible that a search engine pays attention when those are shown in larger font sizes, or bolder than the rest of the page text, or in a different font than the remainder of the words on the page. But that prominence and that display isn’t really what a heading element is about. HTML has a font size large attribute and property. There’s also a bold property. Any words on a page near the top of that page might be said to be more prominent than others.
You can use many HTML element attributes and values and/or cascading style sheet properties to make words within different HTML elements bolder and larger, and to transform them to all capitals or a different font or color, or all of those if you want. You can purposefully place certain text at the top of a page to make it appear that the rest of the page is described by those words.
In the last installment of this series, we looked at how Google may be using phrase based indexing to use the fact that many phrases often tend to co-occur with other phrases within the content of web pages, to re-rank those pages. When we look at phrases, we also need to drill down to a special set of phrases describing named entities, or specific people, places, or things. In addition to trying to understand which phrases might tend to co-occur with those named entities, the search engines may look to other sources such as Wikipedia, Freebase from Metaweb, the Internet Movie Database (IMDB), and different map databases to attempt to understand when a phrase indicates an actual (or fictional) entity.
Google, Bing, and Yahoo all look for named entities on web pages and in search queries, and will use their recognition of named entities to do things like answer questions such as “where was Barack Obama born?”
Looks like Google and IBM are working together again to build up Google’s patent portfolio, from an update at the United States Patent and Trademark Office (USPTO) patent assignment database. Details beyond the actual patents involved aren’t known yet. The last couple of times I wrote about large patent transactions between Google and IBM this past July and September, Google ended up sending out emails a few hours after my posts to a number of large media sites such as the New York Times, Bloomberg News, the Wall Street Journal, and a number of others disclosing the acquisitions. We’ll see if they do that again.
The last week of 2011, Google acquired 188 granted patents and 29 published pending patent applications from IBM, according to the USPTO assignment database, with an execution data on the assignment of the patents on December 28, 2011, in a deal that was officially recorded at the patent office on December 30, 2011.
The patents cover a broad range of topics, such as presentation software, blade servers, data caching, server load balancing, network performance, video conferencing, email administration, and instant messaging applications. A number of the patents cover specific internet, phone, and mobile phone technologies as well.
Back in 2007, I wrote about a Yahoo patent describing how Yahoo! might crawl a webpage, and then recrawl the same page around a minute later to see if any of the links on the page had changed. It might do that to try to identify what it called “Transient Links,” or links that pointing to things like advertisements that might change on every visit to a page, which aren’t links that the search engine would want to crawl and index. The post is A Yahoo Approach to Avoid Crawling Advertisement and Session Tracking Links.
Google was granted a patent this week on a similar topic that looks at “transient” content on web pages. While this kind of content might include advertisements as well, that change regularly on return visits to page, it could also include things like current weather forecasts (Warrenton, Virginia, 40 degrees and cloudy) for example. That kind of content changes on a regular basis, but often has little to actually do with content found elsewhere on a page.
Google would want to be able to identify transient content so that it wouldn’t index pages based upon it, and it wouldn’t show advertisements that focus upon it either.
Apple’s latest phone has a slick voice control feature named Siri that lets you tell your phone to do a number of different things, and can even power searches that it will answer for you. There’s been some speculation that type of verbal interaction might harm Google because it would bypass the search advertisements that are Google’s primary way of earning money. Looks like Google isn’t taking that possibility lightly.
Will the future of searching involve speech based searches that we do on our phones, with results shown on our TV? A Google patent application describes the possibility.
The builder of the largest search engine in the World during the first decade of the 21st century joined Google shortly after building that search engine, and possibly licensed the technology behind it to Google. She worked for Google for a number of years, creating a way of indexing pages based upon the meaningful phrases that appear upon those pages, looking at how phrases co-occur on pages to cluster and rerank those pages, using the phrases to identify spam pages and pages with duplicate content, and creating taxonomies and snippets for pages using phrases. This phrase-based indexing system provided a way to defeat Googlebombing, and to determine how much anchor text relevance should be passed along with links.
Then Anna Patterson left Google to start the search engine Cuil, which was supposed to be a Google killer. Except it wasn’t. Now she’s back at Google, and looks to be working on phrases again.
Google acquired a number of patents from a company that’s presently suing a number of major developers of wireless hardware devices for patent infringement. The company is Gold Bridge Technology (GBT), and they tell us on their “Meeting the Challenge” page:
One of GBT’s most significant group of patents pertains to the UMTS W-CDMA Standard. All equipment manufacturers and service providers providing 3rd Generation (“3G”) wireless service adhere to the technical specifications set by this standard. GBT has a number of patents that are essential to this standard and offers for license its portfolio of UMTS patents.
GBT has at least two pending lawsuits in Federal District Court in the District of Delaware based upon a couple of wireless patents 6,574,267 and 7,359,427. Those patents both have the title,”Rach ramp-up acknowledgement.” The GBT Meeting page also tells us that their Random Access Channel technology (“RACH”) Ramp up and Acknowledgment is the most widely used of their technology.