Google News Algorithm Updated

Sharing is caring!

The image above, from Google’s patent, shows a source rank that no longer exists in the same way that it did when this patent was first published. Algorithms can change over time, and this one likely did.

Neither The Nation or Computerworld should write about patents. Period. Never. In the past couple of days Computerworld posted a “breaking news” story about the publication of a Google patent application from 9 months ago (not breaking news). The Nation wrote a followup story on Computerworld‘s story, and made the same mistake.

Both saw optimism when they should have instead felt fear.

They weren’t publishing information about a 9-month-old patent, but rather a ten-year-old patent. They would have known that if they ever wrote about patents. 🙂

It was an easy enough mistake to make, and one that most journalists will make if they don’t know much about patents, or don’t ask for the help of someone who does. It’s a story about how Google ranks stories in Google News, and what signals they might look at when deciding which source to feature out of a cluster of similar stories about similar topics.

Both The Nation and Computerworld didn’t know where to look (in the patent), and ended up stuck in 2003.

In September of 2002, Google launched its news service, Google News. The news service ran as an automated system where the stories displayed were chosen algorithmically. In September of 2003, Google filed the patent, Systems and methods for improving the ranking of news articles. The patent describes how the automated news service might work, and how different news sources might be ranked when they publish a news story that might be substantially similar to other stories from other sources.

It took almost 6 years until the Google patent was granted, and I wrote about it the day after it was granted (breaking news, there) in the post Google News Rankings and Quality Scores for News Sources.

In my post, I wrote about how Google might create a source score for news sources that published news articles. These included such things as:

  • Circulation statistics of the news source
  • The size of the staff associated with the news source
  • The number of news bureaus associated with the news source
  • Original named entities appearing in articles produced by the news source

In the conclusion to the post, I questioned many of those assumptions and others from the patent (the role of traditional news agencies was already changing on the Web):

For instance, if a breaking story came out about a discovery in Physics, and a reputable and well-respected site on Physics News published an insightful and detailed article on the discovery, it’s possible that could be a better source for the topic than a news source which may have written about the discovery first, has many more reporters and much wider circulation, gets seen by a much more international audience, has a wide number of news bureaus, has been publishing since the 1800s, and was written by someone who doesn’t know much about physics at all.

In February of 2012, a new version of the Google patent was published as a pending application. (A second version was granted in 2012). The third version has the same name as the first version, and it has substantially the same description section as the first version. What’s different is the “claims” section. The claims section of the new version of the patent starts with:

1-31. (canceled)

Gone are things like the “circulation statistics of the news source,” the “number of bureaus associated with the news source,” and other things associated with the kind of journalism that’s done in print.

That’s not how The Nation or Computerworld saw it.

The Nation published Patent Offers Clues on How Google Controls the News earlier today. It’s based on a Computerworld post from yesterday titled, An inside look at Google’s news-ranking algorithm

As we are told in the Computerworld post:

The metrics cited in the patent application include the number of articles produced by a news organization during a given time period; the average length of an article from a news source; and the importance of coverage from the news source.

Other metrics include a breaking news score, usage patterns, human opinion, circulation statistics and the size of the staff associated with a particular news operation.

The Nation copies that text from Computerworld. It also adds this section, and italicizes part of it (which I’ll reproduce):

A tenth metric may include a value representing the number of original named entities the news source produces within a cluster of related articles…[this is worthwhile because] if a news source generates a news story that contains a named entity that other articles [on the same topic] do not contain, this may be an indication that the news source is capable of original reporting (emphasis added).

The claims section of this third version of the patent ignores many of the metrics listed in the claims of the first version and listed in the description section of both. The Nation and Computerworld are reporting 10 year old news, by reporting upon what was originally filed by Google in 2003.

The role of the traditional news agency has changed significantly since then in how Google ranks news articles. It may not be completely dead, but it’s much more likely that an online news source that breaks a story will stand a chance of ranking ahead of an agency with large print circulation stats, news bureaus, and large staffs of reporters.

If you want to learn more about the differences between the patents, find someone who knows how to read a patent.

Added February 23, 2013, @ 9:30 (est) – Unfortunately, I would expect that if Computerworld and The Nation were going to write about a patent filing, and they saw a version from last year that was still pending, that they would have checked on it to see if it was granted, and would have linked to the granted version of the patent. They didn’t do that in this case.

The latest version of Google’s patent was granted on December 11, 2012, and can be found at: Systems and methods for improving the ranking of news articles (US Patent 8,332,382). I shouldn’t have relied upon them and should have checked myself.

I’ve been asked about the claims section above for the new version, and the “canceled” section that it starts with that I mentioned above. That’s been removed in the granted version of the patent, and there’s still no mention of things like circulation subscriptions and news bureaus, and so on, in the claims of this new version. I am going to download the original filing from the USPTO PAIR database (one page at a time) put those pages together, and make a copy accessible here so that we can see. It originally faced a non-final rejection and was amended, so I may download those as well. – Thanks.

Added February 23, 2013 @ 11:47 (est)

I’ve downloaded the original claims from the USPTO Public PAIR (Patent Application Information Retrieval) database and amended claims for the patent, as well as a request to amend the claims.

The original claims filed for the patent did include language that would favor a traditional news agency, such as considering circulation subscriptions, number of news agency bureaus, and so on. The amendment request came before any action at all by the USPTO, and the claims were amended by removing the sections that do appear to favor an older model of imputing more credibility and reputation to a traditional news agency model:

Original Claims for the Third Version of the Google News Patent Filing (pdf) 296 KB
Amendment Request (pdf) 26 KB
Amended Claims for the Google News Patent Filing (pdf) 121 KB

The third document is the one that includes the “1-31. (canceled)” language that I quote above.

This amendment to the original claims filed with the patent took place at the start of the patent case and was an intentional decision on the part of the filers from Google not to include the language that would favor traditional news agencies. Not sure why Google didn’t originally file the claims that now appear in the patent the first time, but these new claims weren’t in response to a rejection of the claims or any other action by the USPTO.

In other words, Google intentionally left out things like news bureaus and subscription circulations from the claims that were considered by the patent office in this third version of the Google News Patent.

Sharing is caring!

19 thoughts on “Google News Algorithm Updated”

  1. You’ve always been prescient when it comes to patents and here you put a spotlight on the very reason why the traditional media is doomed. It’s not the patent, it’s the continued failure of their industry to come to terms with the rapid change of media even so far as not being able/willing to do the leg work to understand and analyze the patents that are the mechanism of their eventual decent into obsolescence.

  2. Thanks, Jeremy.

    I had read about the posts from Computerworld and The Nation earlier tonight, after reading a Google Plus post from David Amerland titled Google News Algo, and I’ve been told that journalists were sharing the story of Google’s News ranking system as a very positive sign. They didn’t know that the sources they were relying upon were telling them good news from 2003. News that had unfortunately changed with the changes to the claims section of the continuation patent.

    Traditional media is doomed, and you’re right that it’s not the patent. The patent echoes a change in society.

  3. Pingback: Are These Google’s Ranking Signals For Google News? | WebProNews
  4. I’m sure the webpro news mis-spelling of your name didn’t grate? Or further emphasise the poor quality of journalism these days.

    Bill, we’ve not met but I’ve been a long time reader and fan of your insight. I had to get in touch. Phil

  5. I agree with Jeremy. ‘Traditional’ media is a dinosaur, it doesn’t matter who breaks the story as long as it’s factual and true.

    Being a cynic I would love to see a publisher of news suffering a ranking penalty for misleading and distorted news. That would set the cat among the pigeons.

  6. Couldn’t help but laugh; journalism and the written word this day and age is in a state of decline. From “old is new” stories like the one you’ve pointed out, Bill, to news items and blog posts just oozing with the now-typical-and-expected grammatical and spelling errors, not to mention the latest fad: the “random missing word” syndrome. Fact-checking, editing, proofing – all seemingly tasks of the past in an effort to be first out with the story. And this is not exclusive to the web/digital media arena by any means.

    What if, one day in the not-too-distant future, no one can make heads or tails of what the author originally intended to write about? What was the message? Did anyone understand? More importantly, was there anyone left to hear it?

  7. Hi Phil,

    Typos happen. Chris misspelled my name. 🙁

    Fortunately he got the link to this page right, and he bothered to update his article.

    That’s better than what I’ve seen at Computerworld, The Nation, Forbes, and the Guardian, who all seem to be in their own happy little worlds thinking that Google likes them, based upon not understanding that the ranking criteria Google used in 2003 (which they all are citing as something “new”) is no longer the criteria that Google uses.

    See:

    Google News: the secret sauce

    Copied on the author’s blog:

    Google News: The Secret Sauce

    Forbes reporting on the “Monday Note” article:

    Why Publishers Need to Stop Worrying and Learn to Love Google Forbes

  8. Hi Mick,

    I’m beginning to feel the same way as I watch this Google News patent information, as misguided as it is, flowing through an echo chamber of the media. Are there any journalists who would actually read the new claims section of the patent? I’ve left a few comments, but it looks like they are happy reporting misguided news about the future of journalism to other journalists.

    Google’s newest version of the patent isn’t saying good things about legacy media, and it is looking like it is merited. 🙁

  9. Hi DDWM,

    I know that there are print publications moving their operations completely online, or hiding their online content behind paywalls. The subscriptions to print versions of news papers are in states of decline in many places.

    Google’s new patent and ranking signals for news stories moves substantially away from judging the credibility of news sources based upon things like how many news bureaus they might have, or how many journalists work for them, or what their print media subscriptions might be like. If they could figure out how to read a patent correctly, they would know that, and they might take this threat a little more seriously.

    As I noted in my post above, they see optimism where they should be feeling fear.

  10. Hi Bill, great post as always.

    You ask, “Are there any journalists who would actually read the new claims section of the patent?” Sadly, today’s newsroom deadlines don’t allow for this type of research. Reporters are busy filing 2-3 stories per day, and preparing for interviews tomorrow. They’re wearing the hats of other positions long since eliminated. Radio reporters are turning their 30-second hits into stories for the paper, and print journalists are doing standups for TV.

    That’s the state of local news these days.

  11. This is a somewhat humorous and very interesting story for me. Some of the things you mentioned I have found to be true in my assessment of what sources are more valuable to me. A more well-written high-quality article is always more pleasurable for me to read than a story in a “breaking news” format.

  12. Where is the actual text that is the final version of the criteria for ranking news? Going into the history of how these news stories were in error, along with all of the versions of the patents, only clouds the issue even more.

    So, the question is, what is the criteria for how Google ranks news? (No history, no claims added in and out, and not necessarily even referencing the patent, just a list of the actual criteria being used today).

    A reply here or an email reply would be terrific.

    PS – Some of the links to PDFs above are empty.

  13. Reporters do this often, it certainly doesn’t just happen with patents. I see this quite a bit in health “news”. I see a headline, read, and it all sounds exciting. Unlike the newspapers, I dig deeper, seek the research paper to read and draw my own conclusions. So often it turns out the research paper was published a year or two ago, that people were talking about it in the news back then, but for some unknown reason it gets picked up again and re-circulated as “news”. Annoying, as it wastes my time! Nobody likes it when you point out that it is really old news either. Such is life.

  14. Hi T.

    The actual criteria for ranking news isn’t something that Google has devulged, and it’s completely unlikely that they will.

    The criteria from the description section of the Google News patents is exactly the same for each of the three versions, and I wrote about those in detail back in 2009 at: Google News Rankings and Quality Scores for News Sources (I linked to it in the article above).

    You’re welcome to spend time reading that. They are the same signals that Computerword and The Nation, and Forbes (twice), and CBS Marketwatch are reporting as if they were something new. It’s possible that Google is looking at other signals as well, and has stopped looking at some they were looking at in the past.

    The “claims” section of the newest version of the patent (linked to above) tells us the kind of criteria that Google may be looking at now (no absolute guarantees – I don’t work for Google, so I can only describe what I see within the patents). If you want to see those, go read the patent.

    The PDFs linked to at the bottom of the post are there, and the links worked fine for me a few minutes ago. Some of those files are really big (which is why I listed their file sizes), and you may have to wait a little while for them, but they do work. Sorry that you couldn’t open them, but they do work.

  15. Hi Andrew,

    It looks like the reporter at Computerworld tried to do some research by going to a professor of journalism at Columbia, but it would probably have helped if he ran it past a lawyer or a paralegal who specializes in IP or someone who knows more about patents.

    The people who reported based upon his story might have tried to do some fact checking too, but don’t appear to have dug too deeply. I do wonder how often something like this happens in the news when a story requires some specialized knowledge about things like medicine and science and the law. 🙁

  16. Bill, thank you so much for your reply – I do appreciate it!

    It sounds like the folks at Google have effectively muddied the waters. 🙂 Even though they won’t admit to anything, much of the information is hiding in plain view via the patents, if anyone can figure out what paragraphs still apply. Having said that, it is, as you mentioned, an extremely complex issue.

    Thanks again…

  17. Hi T.

    You’re welcome. I think it would be a mistake to blame Google here,and they didn’t muddy the waters – the computerworld article did.

    Google filed a continuation patent in the manner that people have been for years, with the same description section as the original patent from 2003. That’s how its done. They weren’t publishing a press release or marketing document intended to promote Google News or even Google. The claims sections are the paragraphs that make a difference and are what the people at the patent office look at when deciding whether to grant a patent or not. The claims section is very different than the original.

  18. Correspondents do this often, it certainly does not just occur with patents. I see this quite a bit in wellness “news”. I see a title, study, and it all appears to be interesting. As opposed to the magazines, I dig further, search for the analysis document to study and sketch my own results. So often it changes out the analysis document was already released a season or two ago, that individuals were referring to it in the information returning then, but for some unidentified purpose it gets grabbed again and re-circulated as “news”. Frustrating, as it waste materials my time! Nobody prefers it when you factor out that it is really old information either. Such is lifestyle.

Comments are closed.