How Google Might Identify Transient Content on Webpages

Back in 2007, I wrote about a Yahoo patent describing how Yahoo! might crawl a webpage, and then recrawl the same page around a minute later to see if any of the links on the page had changed. It might do that to try to identify what it called “Transient Links,” or links that pointing to things like advertisements that might change on every visit to a page, which aren’t links that the search engine would want to crawl and index. The post is A Yahoo Approach to Avoid Crawling Advertisement and Session Tracking Links.

Google was granted a patent this week on a similar topic that looks at “transient” content on web pages. While this kind of content might include advertisements as well, that change regularly on return visits to page, it could also include things like current weather forecasts (Warrenton, Virginia, 40 degrees and cloudy) for example. That kind of content changes on a regular basis, but often has little to actually do with content found elsewhere on a page.

Google would want to be able to identify transient content so that it wouldn’t index pages based upon it, and it wouldn’t show advertisements that focus upon it either.

Continue reading

Forget Siri: Google Voice Phone Searches May Display Results on TV

Apple’s latest phone has a slick voice control feature named Siri that lets you tell your phone to do a number of different things, and can even power searches that it will answer for you. There’s been some speculation that type of verbal interaction might harm Google because it would bypass the search advertisements that are Google’s primary way of earning money. Looks like Google isn’t taking that possibility lightly.

Will the future of searching involve speech based searches that we do on our phones, with results shown on our TV? A Google patent application describes the possibility.

Images from the Google patent showing someone asking their phone when Seinfeld is on with the answers displayed on the large screen TV in front of them, and another image showing a flow of a voice search sent to a search engine and then a TV screen.

Continue reading

10 Most Important SEO Patents, Part 5 – Phrase Based Indexing

The builder of the largest search engine in the World during the first decade of the 21st century joined Google shortly after building that search engine, and possibly licensed the technology behind it to Google. She worked for Google for a number of years, creating a way of indexing pages based upon the meaningful phrases that appear upon those pages, looking at how phrases co-occur on pages to cluster and rerank those pages, using the phrases to identify spam pages and pages with duplicate content, and creating taxonomies and snippets for pages using phrases. This phrase-based indexing system provided a way to defeat Googlebombing, and to determine how much anchor text relevance should be passed along with links.

A screenshot from Phrase Based Indexing in an Information Retrieval System showing how phrases are identified as good phrases and bad phrases.

Then Anna Patterson left Google to start the search engine Cuil, which was supposed to be a Google killer. Except it wasn’t. Now she’s back at Google, and looks to be working on phrases again.

Continue reading

Google Acquires Significant 3G Patents

Google acquired a number of patents from a company that’s presently suing a number of major developers of wireless hardware devices for patent infringement. The company is Gold Bridge Technology (GBT), and they tell us on their “Meeting the Challenge” page:

One of GBT’s most significant group of patents pertains to the UMTS W-CDMA Standard. All equipment manufacturers and service providers providing 3rd Generation (“3G”) wireless service adhere to the technical specifications set by this standard. GBT has a number of patents that are essential to this standard and offers for license its portfolio of UMTS patents.

An image from the USPTO database showing the assignment of patents from GBT to Google.

GBT has at least two pending lawsuits in Federal District Court in the District of Delaware based upon a couple of wireless patents 6,574,267 and 7,359,427. Those patents both have the title,”Rach ramp-up acknowledgement.” The GBT Meeting page also tells us that their Random Access Channel technology (“RACH”) Ramp up and Acknowledgment is the most widely used of their technology.

Continue reading

10 Most Important SEO Patents: Part 4 – PageRank Meets the Reasonable Surfer

PageRank is a measure that stands for a probability that if someone starts out any page on the Web, and randomly clicks on links they find on pages, or gets bored every so often and teleports (yes, that is official technical search engineer jargon) to a random page, that eventually they will end up at a specific page.

Larry Page referred to this person clicking on links as a “random surfer.” Thing is, most people aren’t so random. It’s not like we’re standing at some street corner somewhere, and just randomly set off in some direction. (OK, I confess that I do sometimes do just that, especially when faced with a sign like that below.)

A street corner in The Plains, Virginia, with a sign showing distances to many other cities near and far.

Imagine someone from Google waking up in the middle of the night, with the thought, “Hmmmm. Maybe we’re not quite doing PageRank quite right. Maybe we should be doing things like paying attention to where links appear on a page, and other things as well.”

Continue reading

10 Most Important SEO Patents: Part 3 – Classifying Web Blocks with Linguistic Features

In earlier days of SEO, many search engine optimization consultants stressed placing important and valuable content towards tops of HTML code on pages, based upon the idea that search engines would weigh prominent content more heavily if it appeared early on in documents. There are still very well known SEO consultants who include information about a “table trick” on their sites describing how to move the main body content for a page above sidebar navigation within the HTML for a page using tables. I’ve also seen a similar trick used with CSS absolute placement in HTML, where less important content appears higher on the HTML page that visitors actually see, but lower in HTML code for a page.

Back in 2003, the folks at Microsoft Research Asia published a paper titled VIPS: a Vision-based Page Segmentation Algorithm. The abstract for the paper describes the approach, telling us that:

A new web content structure analysis based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure based on his visual perception. Comparing to other existing techniques, our approach is independent to underlying documentation representation such as HTML and works well even when the HTML structure is far different from layout structure.

Continue reading

10 Most Important SEO Patents: Part 2 – The Original Historical Data Patent Filing and its Children

Imagine gathering together 10 extremely knowledgeable search engineers, locking them into a room for a couple of days with walls filled with whiteboards, with the intent of having them brainstorm ways to limit stale content and web spam from ranking highly in search results. Add to their challenge that the methods they come up with should focus upon “the nature and extent of changes over time” to web sites. Once they’ve finished, then imagine taking what appears on those whiteboards and condensing it into a patent.

The end result would likely look like Google’s patent Information Retrieval based on Historical Data. When this patent was originally published as a pending patent application awaiting prosecution and approval back on March 31, 2005, it caused quite a stir in the SEO community. Here are a few of the many reactions in forums and blog posts as a result:

Continue reading

10 Most Important SEO Patents: Part 1 – The Original PageRank Patent Application

I like looking at patents and whitepapers and other primary sources from search engines to help me in my practice of SEO. I’ve been writing about them for more than 5 years now, and am putting together this series of the 10 Most important SEO patents to share some of what I’ve learned during that time. These aren’t patents about SEO, but rather ones that I would recommend to anyone interested in learning more about SEO by looking at patents from sources like Google or Microsoft or Yahoo.

The first PageRank patent application was never published by the United States Patent and Trademark Office (USPTO), it was never assigned to a particular company or organization, and it was never granted. It avoids dense legal language and mathematics that can make reading patents difficult, and it captures the excitement of a candidate Ph.D. student, Larry Page, who has just come up with a breakthrough in indexing webpages that had the potential to be a vast improvement over other search engines at the time it was published.

The top of the cover letter for the provisional patent filing for PageRank.

Continue reading