The image above, from Google’s patent, shows a source rank that no longer exists in the same way that it did when this patent was first published. Algorithms can change over time, and this one likely did.
Neither The Nation or Computerworld should write about patents. Period. Never. In the past couple of days Computerworld posted a “breaking news” story about the publication of a Google patent application from 9 months ago (not breaking news). The Nation wrote a followup story on Computerworld‘s story, and made the same mistake.
Both saw optimism when they should have instead felt fear.
They weren’t publishing information about a 9-month-old patent, but rather a ten-year-old patent. They would have known that if they ever wrote about patents. 🙂
It was an easy enough mistake to make, and one that most journalists will make if they don’t know much about patents, or don’t ask for the help of someone who does. It’s a story about how Google ranks stories in Google News, and what signals they might look at when deciding which source to feature out of a cluster of similar stories about similar topics.
Both The Nation and Computerworld didn’t know where to look (in the patent), and ended up stuck in 2003.
In September of 2002, Google launched its news service, Google News. The news service ran as an automated system where the stories displayed were chosen algorithmically. In September of 2003, Google filed the patent, Systems and methods for improving the ranking of news articles. The patent describes how the automated news service might work, and how different news sources might be ranked when they publish a news story that might be substantially similar to other stories from other sources.
It took almost 6 years until the Google patent was granted, and I wrote about it the day after it was granted (breaking news, there) in the post Google News Rankings and Quality Scores for News Sources.
In my post, I wrote about how Google might create a source score for news sources that published news articles. These included such things as:
- Circulation statistics of the news source
- The size of the staff associated with the news source
- The number of news bureaus associated with the news source
- Original named entities appearing in articles produced by the news source
In the conclusion to the post, I questioned many of those assumptions and others from the patent (the role of traditional news agencies was already changing on the Web):
For instance, if a breaking story came out about a discovery in Physics, and a reputable and well-respected site on Physics News published an insightful and detailed article on the discovery, itâ€™s possible that could be a better source for the topic than a news source which may have written about the discovery first, has many more reporters and much wider circulation, gets seen by a much more international audience, has a wide number of news bureaus, has been publishing since the 1800s, and was written by someone who doesnâ€™t know much about physics at all.
In February of 2012, a new version of the Google patent was published as a pending application. (A second version was granted in 2012). The third version has the same name as the first version, and it has substantially the same description section as the first version. What’s different is the “claims” section. The claims section of the new version of the patent starts with:
Gone are things like the “circulation statistics of the news source,” the “number of bureaus associated with the news source,” and other things associated with the kind of journalism that’s done in print.
That’s not how The Nation or Computerworld saw it.
The Nation published Patent Offers Clues on How Google Controls the News earlier today. It’s based on a Computerworld post from yesterday titled, An inside look at Google’s news-ranking algorithm
As we are told in the Computerworld post:
The metrics cited in the patent application include the number of articles produced by a news organization during a given time period; the average length of an article from a news source; and the importance of coverage from the news source.
Other metrics include a breaking news score, usage patterns, human opinion, circulation statistics and the size of the staff associated with a particular news operation.
The Nation copies that text from Computerworld. It also adds this section, and italicizes part of it (which I’ll reproduce):
A tenth metric may include a value representing the number of original named entities the news source produces within a cluster of related articlesâ€¦[this is worthwhile because] if a news source generates a news story that contains a named entity that other articles [on the same topic] do not contain, this may be an indication that the news source is capable of original reporting (emphasis added).
The claims section of this third version of the patent ignores many of the metrics listed in the claims of the first version and listed in the description section of both. The Nation and Computerworld are reporting 10 year old news, by reporting upon what was originally filed by Google in 2003.
The role of the traditional news agency has changed significantly since then in how Google ranks news articles. It may not be completely dead, but it’s much more likely that an online news source that breaks a story will stand a chance of ranking ahead of an agency with large print circulation stats, news bureaus, and large staffs of reporters.
If you want to learn more about the differences between the patents, find someone who knows how to read a patent.
Added February 23, 2013, @ 9:30 (est) – Unfortunately, I would expect that if Computerworld and The Nation were going to write about a patent filing, and they saw a version from last year that was still pending, that they would have checked on it to see if it was granted, and would have linked to the granted version of the patent. They didn’t do that in this case.
The latest version of Google’s patent was granted on December 11, 2012, and can be found at: Systems and methods for improving the ranking of news articles (US Patent 8,332,382). I shouldn’t have relied upon them and should have checked myself.
I’ve been asked about the claims section above for the new version, and the “canceled” section that it starts with that I mentioned above. That’s been removed in the granted version of the patent, and there’s still no mention of things like circulation subscriptions and news bureaus, and so on, in the claims of this new version. I am going to download the original filing from the USPTO PAIR database (one page at a time) put those pages together, and make a copy accessible here so that we can see. It originally faced a non-final rejection and was amended, so I may download those as well. – Thanks.
Added February 23, 2013 @ 11:47 (est)
I’ve downloaded the original claims from the USPTO Public PAIR (Patent Application Information Retrieval) database and amended claims for the patent, as well as a request to amend the claims.
The original claims filed for the patent did include language that would favor a traditional news agency, such as considering circulation subscriptions, number of news agency bureaus, and so on. The amendment request came before any action at all by the USPTO, and the claims were amended by removing the sections that do appear to favor an older model of imputing more credibility and reputation to a traditional news agency model:
Original Claims for the Third Version of the Google News Patent Filing (pdf) 296 KB
Amendment Request (pdf) 26 KB
Amended Claims for the Google News Patent Filing (pdf) 121 KB
The third document is the one that includes the “1-31. (canceled)” language that I quote above.
This amendment to the original claims filed with the patent took place at the start of the patent case and was an intentional decision on the part of the filers from Google not to include the language that would favor traditional news agencies. Not sure why Google didn’t originally file the claims that now appear in the patent the first time, but these new claims weren’t in response to a rejection of the claims or any other action by the USPTO.
In other words, Google intentionally left out things like news bureaus and subscription circulations from the claims that were considered by the patent office in this third version of the Google News Patent.