Evolution of Google’s News Ranking Algorithm

New Ranking Algorithm Based on Entities

Sharing is caring!

Image: Photo by Nathan Dumlao on Unsplash

Did the Algorithm Behind How News Articles Rank at Google Just Change?

A Google Patent about how news articles are ranked by Google was updated this week, and in this case it suggests how entities in those documents can have an impact on ranking.

How Have News Articles Been Ranked at Google?

This patent was originally filed in 2003.

The beta version of Google News was first launched by Google in 2002, so this was one of the early patents that described how Google ranked news articles.

One of the inventors of the original patent was Krishna A. Bharat, known as a founder of Google News.

The newest version (a continuation patent) was just granted and is the Sixth Version of the patent. It can be found at:

Systems and methods for improving the ranking of news articles
Inventors: Michael Curtiss, Krishna A. Bharat, and Michael Schmitt
Assignee: Google LLC
US Patent: 10,459,926
Granted: October 29, 2019
Filed: April 27, 2015

This version of the patent provides a history of previous versions of the patent, and when they were filed and what the patent numbers of the earlier 5 versions are:

This application is a

(1) continuation of U.S. patent application Ser. No. 14/140,108, filed on Dec. 24, 2013, which is a

(2) continuation of U.S. patent Ser. No. 13/616,659, filed on Sep. 14, 2012 (now U.S. Pat. No. 8,645,368), which is a

(3) continuation of U.S. patent application Ser. No. 13/404,827, filed Feb. 24, 2012, (now U.S. Pat. No. 8,332,382), which is a

(4) continuation of U.S. patent application Ser. No. 12/501,256, filed on Jul. 10, 2009, (now U.S. Pat. No. 8,126,876), which is a

(5) continuation of U.S. patent application Ser. No. 10/662,931, filed Sep. 16, 2003, (now U.S. Pat. No. 7,577,655),

the disclosures of which are hereby incorporated by reference herein.

What A Continuation Patent is

Continuation Patents take the date of the filing of the patent they are continuing (or the ones those patents are continuing) and are intended to show how the process described by the patents have changed. The processes are set out in the claims sections of the patents, which are the parts of the patents that the prosecuting patent officer reviews when deciding whether or not to grant the new patents.

Often, looking at the very first claim of each patent can help identify important aspects that have changed from one version of a patent to another. It is somewhat rare (in my experience) to see a patent that has been updated 6 times as this one has. I recently wrote about Google’s Universal Search Interface patent which was recently updated a fourth time – Google’s New Universal Search Results.

What Caused A Recent Rankings Change at the New York Times?

A post on Twitter this week suggested that The New York Times may have been negatively impacted by a new Algorithm called Bert that was just released at Google, which was announced in Understanding searches better than ever before.

That Tweet does tell us that it is possible that BERT may have had an impact or a move to Mobile-First Indexing may have caused a loss of rankings at the Newspaper’s site. But seeing that tweet, and seeing that there was a new version of this patent made me curious to see what it contained, and what the changes it may have brought about were.

The Changing Claims from the Ranking of News Articles Patents

But it’s possible that other changes at Google could also have an impact on rankings at news sites. One way to tell how Google changed it how ranks articles is to look at how the patent covering the ranking of news articles has changed over time.

Compare How the first 4 claims from this patent have changed over time.

The latest first claim in this patent introduces some new things to look at

What is claimed is:

1. A method for ranking results, comprising: receiving a list of objects; identifying a first object in the list and a first source with which the first object is associated; identifying a second object in the list and a second source with which the second object is associated; determining a quantity of named entities that (i) occur in the first object that is associated with the first source, and (ii) do not occur in objects that are identified as sharing a same cluster with the first object but that are associated with one or more sources other than the first source; computing, based at least on the quantity of named entities that (i) occur in the first object that is associated with the first source, and (ii) do not occur in objects that are identified as sharing a same cluster with the first object but that are associated with one or more sources other than the first source, a first quality value of the first source using a first metric, wherein a named entity corresponds to a person, place, or organization; computing a second quality value of the second source using a second metric that is different from the first metric; and ranking the list of objects based on the first quality value and the second quality value.

2. The method of claim 1 wherein the identifying the first source with which the first object is associated includes: identifying the first source based on a uniform resource locator (URL) associated with the first object.

3. The method of claim 1 wherein the first source is a news source.

4. The method of claim 1 wherein computing the first quality value of the first source is further based on: one or more of a number of articles produced by the first source during a first time period, an average length of an article produced by the first source, an amount of important coverage that the first source produces in a second time period, a breaking news score, network traffic to the first source, a human opinion of the first source, circulation statistics of the first source, a size of a staff associated with the first source, a number of bureaus associated with the first source, a breadth of coverage by the first source, a number of different countries from which traffic to the first source originates, and a writing style used by the first source.

From the version of the patent that was filed on Sep. 14, 2012 (now U.S. Pat. No. 8,645,368):

What is claimed is:

1. A method comprising: determining, using one or more processors and based on receiving a search query, articles and respective scores; identifying, using one or more processors, for an article of the articles, a source with which the article is associated; determining, using one or more processors, a score for the source, the score for the source being based on: a metric that represents an evaluation, by one or more users, of the source, and an amount of traffic associated with the source; and adjusting, using one or more processors, the score of the article based on the score for the source.

2. The method of claim 1, where identifying the source includes identifying the source based on an address associated with the article.

3. The method of claim 1, where determining the score includes accessing a memory to determine the score for the source.

4. The method of claim 1, where the score for the source is further based on a length of time between an occurrence of an event and publication, by the source, of an article associated with the event.

From the Version of the patent filed on Feb. 24, 2012, (now U.S. Pat. No. 8,332,382):

What is claimed is:

1. A computer-implemented method comprising: obtaining, in response to receiving a search query, articles and respective scores; identifying, using one or more processors, for an article of the articles, a source with which the article is associated; determining, using one or more processors, a score for the source, based on polling one or more users to request the one or more users to provide a metric that represents an evaluation of a source and based on a length of time between an occurrence of an event and publication, by the source, of another article associated with the event; and adjusting, using one or more processors, the score of the article based on the score for the source.

2. The method of claim 1, where identifying the source includes identifying the source based on an address associated with the article.

3. The method of claim 1, where adjusting the score of the article includes: determining, using the score for the source, a new score for the article associated with the source; and adjusting the score of the article based on the determined new score.

4. The method of claim 1, where the score for the source is further based on a usage pattern indicating traffic associated with the source.

From the version of the patent that was filed on February 10, 2009, (Now U.S. Pat. No. 8,126,876):

What is claimed is:

1. A method, performed by one or more server devices, the method comprising: receiving, at one or more processors of the one or more server devices, a search query, from a client device; generating, by one or more processors of the one or more server devices and in response to receiving the search query, a list of references to news articles; identifying, by one or more processors of the one or more server devices and for each reference in the list of references, a news source with which each reference is associated; determining, by one or more processors of the one or more server devices and for each identified news source, whether a news source rank exists; determining, by one or more processors of the one or more server devices and for each reference with an existing corresponding news source rank, a new score by combining the news source rank and a score corresponding to a previous ranking of the reference; and ranking, by one or more processors of the one or more server devices, the references in the list of references based, at least in part, on the new scores.

2. The method of claim 1, where determining whether each news source rank exists includes accessing a database to locate the news source rank.

3. The method of claim 1, further comprising: providing the ranked list of references to the client device.

4. The method of claim 1, where determining the new score comprises: determining, for each reference with an existing corresponding news source rank, a weighted sum of the news source rank and the score corresponding to the previous ranking of the reference.

And the Very First Version of the patent filed on September 16, 2003, (Now U.S. Pat. No. 7,577,655):

What is claimed is:

1. A method comprising: determining, by a processor, one or more metric values for a news source based at least in part on at least one of a number of articles produced by the news source during a first time period, an average length of an article produced by the news source, an amount of coverage that the news source produces in a second time period, a breaking news score, an amount of network traffic to the news source, a human opinion of the news source, circulation statistics of the news source, a size of a staff associated with the news source, a number of bureaus associated with the news source, a number of original named entities in a group of articles associated with the news source, a breadth of coverage by the news source, a number of different countries from which network traffic to the news source originates, or a writing style used by the news source determining, by the processor, an importance metric value representing the amount of coverage that the news source produces in a second time period, where the determining an importance metric includes: determining, by the processor, for each article produced by the news source during the second time period, a number of other non-duplicate articles on a same subject produced by other news sources to produce an importance value for the article, and adding, by the processor, the importance values to obtain the importance metric value; generating, by the processor, a quality value for the news source based at least in part on the determined one or more metric values; and using, by the processor, the quality value to rank an object associated with the news source.

2. The method of claim 1 where the determining includes: determining, by the processor, a plurality of metric values for the news source.

3. The method of claim 2 where the generating includes: multiplying, by the processor, each metric value in the plurality of metric values by a factor to create a plurality of adjusted metric values, and adding, by the processor, the plurality of adjusted metric values to obtain the quality value.

4. The method of claim 3 where the plurality of metric values includes a predetermined number of highest metric values for the news source.

How the News Ranking Claims Differ

An analysis of changes over Time to the patent for “Systems and methods for improving the ranking of news articles,” should reflect how Google has changed how they have been implementing that patent.

We can see that in the claims for the very first patent (filed in 2003) that Google was looking at metric values for different news sources to rank the content that those sources were creating. That very long first claim from that version of the patent list a number of metrics to use to rank news sources, and that ranking influenced the ranking of news articles. So a story from a very well known news agency would have a tendency to rank higher than a story from a lesser-known agency.

The version of the patent filed in 2009 still focuses upon news sources (and a “news source rank”), along with references to the news articles generated by those news sources.

The version of the patent filed in February 2012 again tells us about a score for a news article that is influenced by a score for a news source, but it doesn’t include the many metrics that the 2003 version of the patent does.

The version of the patent filed in September 2012 Holds on to the score for the source, but tells us that score is based on a metric that represents an evaluation, by one or more users, the amount of traffic associated with the source, and a score for the article based upon a score for the source.

The most recent published version of this patent, filed in April 2015, and granted in October 2019 introduces some changes in how news articles may be ranked by Google. It tells us about how articles covering different topics are placed in clusters (which isn’t new in itself), and how those articles may rank higher than other articles by covering more entities that aren’t covered by articles in the same clusters

Sharing is caring!

7 thoughts on “Evolution of Google’s News Ranking Algorithm”

  1. This is fascinating! As someone who works in Editorial at a News Publisher, would you have any recommendations on how to best write articles which successfully get our topics clustered to be ranked higher? Does ‘covering more entities’ mean more articles about a single topic but looking at different elements?

    Thanks as ever for your brilliant insights

  2. Hi Nick,

    I wouldn’t be surprised, if you asked some reporters for your news organization about getting more entities mentioned in the stories you provide, that they would have ideas on how to do that. Those could likely interviewing people about events that they might have some interest in, reporting about more places that may be impacted by stories, and what is happening in those places. Named Entities are specific people, places, and things, and having more of those in a story could make it more informative – make those stories more interesting, and readers will appreciate them, and it now appears that Google may too.

  3. Excellent read, Positive site, where did u come up with the information on this posting? I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work

  4. Hi Rakesh,

    I came up with the information on this post from the United States Patent and Trademark Office website (USPTO.gov), where newly granted and newly published pending patents are added every week.

    I saw that there was a new version of this patent, and searched for the older versions so that I could compare the different claims on those, and see how they changed in the 6 versions with the first filed in 2003.

    Since the patents are filed by companies like Google to protect the intellectual property they have developed when they invent something, the patent filings are fairly straightforward and free of marketing language. Since the patents are reviewed by prosecuting patent attorneys to determine whether they should be granted, they are at a fairly high standard of quality.

    Glad you liked the posts you read.

  5. I am reading your blog for some time I like most of your content mostly about google whenever I need to know something new about what google is doing. I visit your website. Thanks for sharing deep knowledge.

  6. Hello Bill Slawski,

    Amazing article this one is about the Evolution of Google’s News Ranking Algorithm. This is very helpful for the Search Engine Optimizers. Thanks for sharing this amazing information with us.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.