Google’s Panda Granted a Patent on Ranking Search Results

One of the most impactful updates at Google was the Panda Update, released into the world in February of 2011, and affecting almost “12%” of all search results. In a Wired interview of Google’s Amit Singhal and Matt Cutts, TED 2011: The ‘Panda’ That Hates Farms: A Q&A With Google’s Top Search Engineers, the name of the update was revealed to be taken from a Google Engineer that played a significant role in its development:

Wired.com: What’s the code name of this update? Danny Sullivan of Search Engine Land has been calling it “Farmer” because its apparent target is content farms.

Amit Singhal: Well, we named it internally after an engineer, and his name is Panda. So internally we called a big Panda. He was one of the key guys. He basically came up with the breakthrough a few months back that made it possible.

There were at least a couple of search engineers at Google with the last name of Panda, and a review of what either had written led to some interesting information, but not much about the Panda Update itself. At some point in time, Google’s Navneet Panda included the following statement on his Google Plus About Page:

Navneet Panda includes the Panda Update in his "bragging rights."

Navneet Panda includes the Panda Update in his “bragging rights.”*

I’ve been keeping an eye out for any patents from Google that have his name on them, and one was granted today.

Ranking search results
Invented by Navneet Panda and Vladimir Ofitserov
Assigned to Google
US Patent 8,682,892
Granted March 25, 2014
Filed: September 28, 2012

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for ranking search results. One of the methods includes:

  • Determining, for each of a plurality of groups of resources, a respective count of independent incoming links to resources in the group;
  • Determining, for each of the plurality of groups of resources, a respective count of reference queries;
  • Determining, for each of the plurality of groups of resources, a respective group-specific modification factor, wherein the group-specific modification factor for each group is based on the count of independent links and the count of reference queries for the group; and
  • Associating, with each of the plurality of groups of resources, the respective group-specific modification factor for the group, wherein the respective group-specific modification for the group modifies initial scores generated for resources in the group in response to received search queries.

It’s going to take a while to drill down into the process described in this patent, and make sense of how it might work, but I’ll tackling that. A quick run through the claims and the description section of the patent reveals some interesting details. While this is the first published or granted patent we’ve seen from Navneet Panda, that doesn’t mean that he doesn’t have others that are presently being prosecuted by the patent office, either.

The patent appears to describe some ranking of pages based upon classifying them by looking at the links pointing to them, the queries that refer to the pages, and how well the pages fit as navigational queries for those queries.

*It appears that Navneet Panda removed his “bragging rights” section on his Google Plus profile after this post was published, which made the claim that he was the “Father. Author of the Google Panda Update.”

Share

36 thoughts on “Google’s Panda Granted a Patent on Ranking Search Results”

  1. Hi Praveen,

    I’m not sure that’s quite a fair thing to say. The update that was named after him has affected a number of websites negatively, but it also significantly addressed a number of criticisms that were being aimed at Google regarding the quality of pages being returned by Google.

    This patent does appear to be an opportunity for us to learn more about his work, so it’s one that we should take advantage of.

  2. Hey Bill

    Interesting how they can qualify “queries that refer to the pages” is that specifically based on anchor text as a URL? Or citations of a page / page title / URL ?

    Google has become more of a “connections engine” and (only looking at your brief description above) it seems this is another indicator of a connection that builds authority or “rank-worthiness”.

    Keep surfacing this stuff Bill… Keeping these types of patents (even if they might not be used in the actual algorithm) in mind helps every SEO understand a little better about what the hell we do and why :-)

    Cheers

  3. Would the fact that this was originally filed in 2012 mean that the techniques that they mention in the patent be changed by now? I am assuming that the rate at which their techniques change is quick?

  4. Thanks, Grant!

    There were some interesting granted patents this week, but I almost spit out my coffee when I saw the name Navneet Panda on this one.

    There’s some interesting discussion on different types of links pointing to a group, and how those might be determined to be independent or not, and then the patent discusses referring queries, and starts off with this passage:

    The system determines a count of reference queries for the group (step 304). A reference query for a group of resources is a search query that has been submitted to a search engine and has been classified as referring to a resource in the group. A query can be classified as referring to a particular resource if the query includes a term that is recognized by the system as referring to the particular resource. For example, a term that refers to a resource may be all of or a portion of a resource identifier, e.g., the URL, for the resource. For example, the term “example.com” may be a term that is recognized as referring to the home page of that domain, e.g., the resource whose URL is “http://www.example.com”. Thus, search queries including the term “example.com” can be classified as referring to that home page. As another example, if the system has data indicating that the terms “example sf” and “esf” are commonly used by users to refer to the resource whose URL is “http://www.sf.example.com,” queries that contain the terms “example sf” or “esf”, e.g., the queries “example sf news” and “esf restaurant reviews,” can be counted as reference queries for the group that includes the resource whose URL is “http://www.sf.example.com.”

    Some analysis of whether or not a page being linked to is a navigational result for the anchor text used is then discussed.

    Definitely some interesting insights into how Google may be looking at links from one page to another in this patent that are worth reading and thinking about by anyone doing SEO.

  5. Thanks, Hardik

    It looks like he has updated his Google+ About page to remove that “Bragging Right”. Happy I made a screen shot this morning to share before he did. :)

  6. Hi Kieran

    Given Google’s statements about 600+ updates to their ranking algorithms a year, it’s definitely possible that the process described in the patent has changed. There’s no guarantee that the process in the patent was one that they followed step-by-step anyway. I believe that Matt Cutts even noted on his blog a few years ago that when Google first implemented PageRank that it started changing almost immediately as it was being used. So yes, chances are good that if Google is using the process described in this patent that those techniques have probably changed in some ways.

  7. Hi Bill,

    nice info, I meanwhile nearly forgot what the Panda Update was about. By the way, nice new layout here, it looks much better than before!

    Greetings

    Gretus

  8. Thanks, Gretus.

    I had some issues with the compatibility of a plugin and my last theme, which was most easily solved by updating the theme and giving it a new design. Happy you like the new look. :)

  9. So much info to read through on that patent.

    “Determining a respective group-specific modification factor for a particular group of resources can include: determining an initial modification factor for the particular group of resources, wherein the initial modification factor is a ratio of a number of independent links counted for the particular group to the number of reference queries counted for the particular group”

    I don’t have as much experience reading through patents as yourself, but does the above lean towards suggesting a correlation between number of links (independent links) to a site and search queries (reference queries) that are directly relevant to a site?
    I might be reading it wrong, but find it fascinating reading through the patents that are being published!

    Does the initial score relate to, basically, how sites are ranked?

  10. The math in the patent matches what we have learned about the Panda algorithm, and some of the justifications also point to the Panda downgrade process. The last descriptive paragraph certainly points toward that algorithm:

    “The subject matter described in this specification can be implemented in particular embodiments so as to realize one or more of the following advantages. Search results identifying low-quality resources can be demoted in a presentation order of search results returned in response to a user’s query. Thus, the user experience can be improved because search results higher in the presentation order will better match the user’s informational needs. ”

    Whether this is the whole shebang or just a part of it is not really clear to me. But it works the way Danny Sullivan and I have speculated that the Panda algorithm should work (computing a score that is added to other scores).

    This could, of course, be a “failed” approach that Google felt was worth protecting. People should be careful about drawing too many conclusions.

  11. I have to agree with you Bill. No way the patent release information and the current algorithm share too much one for one similarity. It is interesting to see how they though this out and get a bit more insight into what they did, and the naming rationale! lol. Are we truly moving closer to a complete non-gameable resource with Panda, or are we still looking at the next big thing on the horizon before we can hope that good work triumphs over tricks and sneaks?

  12. I’m going to throw my hat in here and say I reckon this isn’t Panda, or at least if it is then it’s only a distinct part of Panda.

    To me it sounds more characteristic of a classifier acting as part of Penguin TBH.

    Would strike me that same/similar engineers could likely be working on these spam algos.

  13. Bill,

    That’s a great discovery. I mean I never thought the name of the update would have been inspired from him. I am sure many of the SEOs and website owners who had been hit by Panda must be searching for him :P :) .

    Thanks for the share.
    Harsh

  14. “The patent appears to describe some ranking of pages based upon classifying them by looking at the links pointing to them, the queries that refer to the pages, and how well the pages fit as navigational queries for those queries.”

    To elaborate on Bill’s summary, the patent specifically mentions ranking modifications for sites (“groups”) based on measures of a site’s backlink count, backlink independence, and query count. The ratios between those measures are the trigger.

  15. It would be interesting to see how the segment you outlined above links to the 23 questions they told publishers to review shortly after Panda rolled out. From the segment above (and what I understand of it), it looks very link / query focussed. How do questions such as “How much quality control is done on content?” play into this patent?

  16. “How do questions such as ‘How much quality control is done on content?’ play into this patent?”

    The answers to those questions would not have been used by the algorithm but rather by the engineers to divide their learning set of Websites into “high quality” and “low quality” sites.

  17. Hi Bill,
    The new theme looks great! If I’m not wrong, the above patent is nothing to do with Panda update. As far as I remember that bragging right was not there one month ago. At that time I wanted to have information about this guy Navneet and checked his G+ profile and that part “author of Panda update” was not there I think. The interesting thing I came to know that his school was just 3 miles away from mine! And most interesting thing is that even many students from that school don’t even remember his name. Definitely no fault of them as they are not aware about SEO. Waiting for a detailed post on the patent soon from you.

  18. Hi Dillip,

    I believe that the bragging rights were there at least a month ago, which was the first time I saw them.

    The more I read this, and think back to the purpose behind Panda, the more it looks like it is the Panda update.

  19. Great stuff, Bill. This may be an incredibly dense question with little or no relationship to the metrics employed by Panda, so I apologize if I’m way off base here. My question is: I do a decent amount of writing that gets published online and I’m a prolific citer. Maybe some of those journalism classes I took have a long reach but I’m just not really comfortable citing facts, figures, stats, obviously images, excerpts or even paraphrasing from other sources, etc., without giving them linked attribution. Does the NUMBER of links in an article trip any of the Panda defenses, even if they’re all high authority or well respected?
    Thanks.

  20. Hey Bill, do you plan to do a more in depth followup on this? Also, is there somewhere online where the patent can be read in full for free?

  21. Hi Ian

    I’m pretty sad that no one has tried to publish something that describes what is in the patent in more detail. There is a link in the middle of my post that goes directly to the full and complete patent where it can be read completely for free. Here’s another link directly to it: Ranking Search Results.

    I am planning on a more in depth followup on the patent, but I think there’s a lot of value in having others read the patent and look at what it covers, and try to decide why it’s written the way it is. Why the discussion within it on “independent links”? Why the section about referring queries (remember that Panda was originally targeting content farms and low quality content)? Why pull in the classification check involving navigational queries?

  22. “Why the discussion within it on ‘independent links’? Why the section about referring queries (remember that Panda was originally targeting content farms and low quality content)? Why pull in the classification check involving navigational queries?”

    I can’t say I have much of an opinion one way or another but if we assume a model where two Websites cover the same topic+event/story, one may be more likely to attract attention due to the quality of its presentation than the other.

    Hence, the signals coming from other Websites may correlate with the higher quality site.

    Example: People who use navigational queries are implying that they trust a specific site.

    Example: A high number of citations/links pointing to a story may indicate it is likely the most important source of information.

  23. Hi Bill,
    Sorry, I didn’t check that patent. I just feel that way. Any way I’ll read that to access the depth of Panda as I feel Panda is playing & will play an important part in their algorithm to provide high quality contents to search queries.

  24. Hi Dilip,

    If you didn’t even try to take a look at the patent, I’m not sure that you’re entitled to even voice an opinion about it. :)

    Even if I do write something up, you’re better off trying to read it yourself than to rely upon what I write.

  25. Hi Bill,
    That was my practice as you have said. But I’m going to follow your wise words & henceforth will try to read the patents. Thanks for this great piece of advice.

  26. It’s crazy that people are so obsessed with the name. It’s another update by google that you are going to have to navigate and learn about.

  27. Definitely some good tidbits in here. Looks like it pulls links & queries then creates a modification factor… Some things to note (if I interpret them correctly):

    It talks about how it could be used used in examples of how it can be applied. I think these examples are important as they are probably scenarios that it may be used for or just talk about methods that are in use.

    - Talks about ” the system calculates an independence score”
    - Talks about determining the count of incoming links as “express links, implied links, or both.” which then clarifies that they can be actual hyperlinks or plain text.
    - links can be counted as not independent if “owned by the same entity, hosted by the same entity, or that were created by the same entity.”

  28. Panda Update has been a big leap for Google towards quality results. The best thing happened for surfers who were tired of seeing spam ranking all the way.. Great Post as always and also I love the new design

  29. Thanks, Dan.

    Panda does seem like a big step forward, but to many people who were impacted by it, I would suspect that they would disagree.

  30. It is interesting that it “counts” the number of reference queries. Would that mean it count “links” I wonder, or the amount of times users do searches (queries) for specific content? Also this patent uses the word “plurality” often. What function in the code would that be talking about, do you think?

  31. Hey Bill. Really great info as always here. Panda is still on the radar and still causing issues – surprising for something apparently so ‘cute and cuddly’ don’t you think? :)
    Neither us nor our clients were affected in any great way by any of the changes because we have always done everything in a methodical, ethical and non-paid manner. While we didn’t instantly see great Google results (around 8 years ago when we started Metalfrog), now we do. Proof that you cannot cheat the engines if ever it was needed!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>