Authority vs. Popularity in Search Engine Rankings

When search engines return web pages in search results in response to a query, most people assume that the pages being show are the ones that a search engine has decided are the “best” pages in response to their search terms. But what does the word “best” mean in that context? The search engines attempt to show pages that are both relevant to the query (and the intent of a searcher), and are popular.

Google’s PageRank algorithm is a popularity algorithm based upon a citation analysis approach to finding pages, or as Google Founder Larry Page noted in Improved Text Searching in Hypertext Systems (pdf):

The intuition is that if your query matches tens of thousands of documents, you would be happier looking at documents that many people thought to mention in their web pages, or that people who had important pages mentioned at least a few times.

There are other ways of measuring popularity that the search engines may be using as well, such as the number of times that a document has been read, or the number of times that it might have been linked to or mentioned or shared on a social network, or selected when shown in a set of search results. A couple of Microsoft patent applications filed this month question the wisdom of using popularity as a way of ranking pages, and tell us that:

The popularity of a particular document, however, does not necessarily indicate that the document is relevant to the search query, or that the document is associated with sources that are considered reliable with respect to the subject matter of the document.

For example, let’s say that you’re looking for the best information that you can find about how gravity works around a black hole. It’s possible that the best information you could find might be found in a scientific journal that specializes in the behavior of black holes. The articles at that journal may even be written by some of the world’s foremost experts on Astronomy for a very scientifically literate audience. Chances are if you performed a search at Google or Yahoo or Bing for that topic that, even if that particular journal was open to the public and freely accessible via search engines, that instead of the journal showing up near the top of search results, instead you would see much more mainstream pages written for a much wider audience.

Those mainstream articles likely have many more links pointing to them than the journal written for scientists. They likely have highly popular pages linking to them from news sources, from government agencies like NASA, and from other more mainstream sites that report about science. While pages that are popular can often be useful and informative pages, they may not be the most authoritative pages that could be shown in response to a query.

Author Authority Ranking

The Microsoft patents describe a system for scoring pages based upon an author’s authority, and for reranking search results based upon those authority scores.

We’re told in the patents that the term “authority” refers to the following characteristics about an author or source of information as might be associated with that author or source in response to a particular topic:

  • Trustworthiness
  • Reliability
  • Knowledgeability
  • Respect

In a few ways, this authority ranking approach reminded me of a recent Microsoft about determining the credibility of resources on the Web that I wrote about in How a Search Engine Might Visualize and Rerank Web Pages Based Upon Credibility. The focus of that paper was upon assessing the credibility of websites rather than particular authors however.

Determining whether an author might be authoritative on a topic could be determined by looking at data associated with the author, such as:

  • Educational degrees held by the source
  • Where those degrees were obtained
  • Citations of the source in scholarly or technical works
  • Number of publications associated with a source
  • Number of social network connections and/or followers
  • Whether or not the source is employed by and/or graduated from a well respected and/or highly cited institution
  • Social networking information such as a number of posts relating to the source and/or a particular topic addressed by the source
  • Number of patents held by the source
  • Number of links to content associated with the source
  • Number of articles citing work associated with the source
  • Ratings and Reviews associated with the source

Content and specific sites from specific sources might be determined to be authoritative about specific topics, and if a query that someone searches for may also be associated with that topic, then pages from that source might be boosted in search results based upon that perceived authority.

Here’s a screenshot of a table from the second patent filing that shows authority scores and some potential influences on those scores:

A table from the second patent filing showing authority scores and some possible influences on those scores.

The patent applications are:

Authority Ranking
Invented by Susan T. Dumais, Stefan David Weitz, Alexander George Gounares, David James Gemmell and Paul Yiu
Assigned to Microsoft Corporation
US Patent Application 20110246484
Published October 6, 2011
Filed: April 1, 2010

Abstract

Concepts and technologies are described herein for authority ranking for real time and social search. An authority index configured to store data relating to sources is generated. Data relating to the sources, including an authority value, are generated and stored at the authority index. The authority value may be defined as a function of source, topic, and point of view (“POV”), as well as other data, if desired, and may be determined based upon one or more ranking functions.

The ranking functions are determined, and data corresponding to the ranking functions is obtained. Each of the ranking functions may be weighted according to a weighting function, a confidence value or interval, one or more time functions, and/or other methods. The obtained authority value may be used for affecting ranking of search results or for other purposes.

Dynamic Reranking of Search Results Based upon Source Authority
Invented by Stefan David Weitz, Alexander George Gounares, and Patrick A. Kinsel
Assigned to Microsoft
US Patent Application 20110246456
Published October 6, 2011
Filed: April 1, 2010

Abstract

Concepts and technologies are described herein for dynamically reranking search results based upon source authority. A search query is received and analyzed. One or more topics are identified in the search query. An authority index is searched to identify authoritative sources for content relating to the identified topic(s). Promoted results corresponding to content generated by the authoritative sources relating to the identified topics are obtained.

The promoted results can be presented to an entity requesting the search, or injected into search results. Contribution dimensions associated with the promoted results can be determined, and filters based upon the contribution dimensions can be generated and used by an entity to dynamically manipulate the search results.

The patents describe in more detail how they would look at contributions by and interactions between a source (a person, an organization, a business, etc.) and others at places like Facebook and Twitter, at ratings and reviews for that source. They discuss learning about relationships between individuals and websites, businesses, educational institutions, and more.

Data about a “source” might be identified explicitly through author bylines (sound a little like Google’s authorship markup approach?), through places they might be explicitly tied to in some way such as institutions or publications or domain names.

The patent filings point to other types of data that might be collected and associated with a source as well, such as:

  • Gender of a source
  • Country of origin associated with the source
  • Language associated with the source, entities and/or other sources related to the source
  • Type of content associated with the source
  • Ranking or rating data
  • Descriptions of content associated with the source
  • Number of words in the content
  • Version number associated with the content
  • Copyright date of the content

Pages that are promoted within search results might be presented separately from more conventional search results, or they may be injected within those results.

Conclusion

In many ways, Microsoft’s approach towards providing an authority score for authors or sources sound like what Google is trying to do with their authorship markup, though we haven’t been given much in the way of details by Google about how and why some authors’ pages or microblog posts might be ranked in search results. We have been given some hints though, that I’ve written about in places like the following posts:

Will we see a similar approach from Microsoft that might involve authorship markup, or that may take more advantage of the relationship between Bing and Facebook, or both?

One question that I have is whether the approach to authority ranking described in the Microsoft patent applications is useful. Are degrees and numbers of patents granted or papers published useful signs of authority? Are there sometimes more authoritative sources who have degrees from less well known educational institutions? Numbers of links on other pages, and numbers of followers in social networks still seem to be important under this approach.

But the patent also looks at the kinds of interactions that authors might have with others, and other information that isn’t tied to popularity as well.

Google’s use of authorship markup also seems aimed at increasing “authority” as a ranking signal as well, though it’s interesting that next to authorship profile images shown in search results, Google is showing “how many circles” someone is in, which seems to be more popularity based that authority based.

Share

38 thoughts on “Authority vs. Popularity in Search Engine Rankings”

  1. Unfortunately, Google is not the be-all, end-all for everyone as much as they might try. “Relevance” is really hard to quantify and that is why I think that different search engine versions are a real necessity. I personally use the three majors to see relevant results and filter from there, but as in your example of the black hole research, I tend to go towards other engines like Wolfram Alpha – of course Google would tell me they already have it handles with Google Scholar…but I like some non-G options.

  2. There are three things that need to be considered :

    1. Authority and Factual accuracy – which will be taken care by the factors you mention;

    2. Relevancy – One might want to know about a pebble in a sea and there is no point in taking him to an article that describes the entire sea (of which a pebble is a small part);

    3. Presentation / Readability – Let’s accept one thing : All those academic journals written by highly acclaimed scientists with double PHD’s are hopelessly complex and very difficult for normal people to read and understand. Even Wiki, sometimes is. So, its important to gauge how much a person likes an article / reads through an article while presenting them to other users.

  3. This is the good stuff. Thanks Bill. It seems even the Engines are having a hard time trying to separate the wheat from the chaff. In its simplest form, results are Quality = Authority + Popularity. But this begs the question, “How Does The Search Engine determine the intent of the user?” G tries to get you logged in and develop a web history, I’m sure M$ wants historical data about the user too. Determining User Intent and returning relevant results is a much tougher nut to crack. I’d like to see Google develop Instant combined with Advanced Search to guide me to a better query. THAT would solve the whole problem in determining Intent for a given Search.

  4. Popularity also means quality content. Practically, good and reliable content becomes over time reliable and an authority is that field. The problem is that a content become popular by the feedback various users give, so such data needs to answers the general needs, while the particularities are often regarded as niche.

  5. I think you’re onto the right thing when you point out the fact of “popularity” being based on linking which most likely means content for a wider audience is going to be on top of the SERPs. However, with the Social Graph and other personalised search coming into play it’s an interesting debate as how these may influence. A scientist, LOGGED IN to their gmail, may be seeing much different results than Joe and Jane Public. But isn’t that the point of all that Google is trying to do lately with Social, Google + and even the rel=author tag? It may not be the right path to show results that have your friends/contacts adding to their “relevance” but this is another form of “popularity” that they seem to be gearing up to take into account.

  6. Hi iMark Media,

    Relevance does come in a number of different flavors, from a straight keyword matching to attempts to fulfill an informational or situational need that doesn’t necessarily need all the keywords, such as showing listings for local pizza places on a query for [pizza] which seems to have a searcher intent of finding a place to get lunch nearby.

    I would probably be happier with the results at Google Scholar for my black hole query, but should I be required to perform a vertical search or a search at a different search engine to find the site I want? That might be part of the intent behind Universal Search, to fill in gaps when something like a more scholarly result might be wanted, but I think we are going to see more of a role for authority in ranking webpages from Google and Microsoft.

  7. Hi Raj,

    Good points, and I like how you’ve broken those factors down.

    My scholarly example was probably a little extreme, but I think that these Microsoft patents and Google’s movement toward authorship both illustrate that we are going to see authority of an author as more of a factor in ranking web pages.

    It’s interesting that Google has integrated Google Scholar into regular search results, and we are seeing things like being able to sort results based upon readability making their way into search results.

  8. Hi Dave,

    Thank you very much.

    Intent is hard. Bing is supposed to be a “decision engine” because of its ability to let you interact with search results and make decisions about the kinds of things that you see with query refinements, blended results, results grouped in categories, and more. But I see Google doing a lot of the same things as well.

    I’m not sure that Web History always does a great job at helping with intent, expecially when the intent behind a search is to help fulfill a situational need. For instance the zoologist who has visited a number of biology sites in the past, and wants to find out how well the Jaguars played last weekend might not see sports scores at the top of search results when he types [jaguar] into Google. Your suggestion about Google Instant and advanced search might help there if he’s offered [jacksonville jaguars] as a suggested query. :)

  9. Hi Mia,

    Popularity doesn’t always necessarily equate with quality, though there are times when they will overlap. I do agree with you that there are times when authorities might not always suit mainstream feedback which can place the more accessible over higher authorities. I think considering both authority and popularity and blending them together as options within search results isn’t necessarily a bad compromise.

  10. Hi Eldad,

    Presenting searchers with some different flavors of search results with some possibly relying more on popularity, and others relying more upon authority might be the best we achieve. I do think that the combination of social, Google +, and authorship markup helps bring us closer to authority measures helping shape the search results that we see.

    And there are times when your friends results adding their “relevance” to results, especially when you’re trying to get the kinds of answer that they might be the best people to answer.

  11. Interesting information. The new search engine world is going to look a lot different than the current one. So much for “LOLCats” and hello Harvard-educated peer-reviewed research scientists. Authorship-abuse could be the new growth market if it becomes a huge factor.

  12. Popularity gives authority. In my point of view, they both work on together. Wouldn’t achieve one with out the other. They both came from valuable and content with quality.

  13. i think both authority and popularity are interrelated. most of the times, authority sites become popular because of their reliable content, on the other hand, popular sites can also become authority sites since a lot of people are visiting or accessing them. in linking to sites, for me, it’s important to consider both

  14. Hi Darren,

    Any new ranking factor that becomes more prominent is always going to suffer some kind of attempts to manipulate and attack them. Search engines have been trying to respond to that by making it much more costly to do so, and in this case it means that it takes some actual work to create author profiles, to engage in a number of meaningful interactions with others, and efforts towards creating a “fake” authority become more time consuming and of less value.

  15. Hi John,

    There often is some kind of overlap between authority and popularity, but one definitely isn’t the other. For example, Justin Bieber may be very popular these days, but I’m not going to rely upon him for suggestions or advice on how to design a web page, fix a car, perform surgery, solve a math problem, write a blog post, and so on. Paris Hilton is even more popular, and someone I’d be even less inclined to rely upon for advice upon almost anything.

  16. Hi Rob,

    There are many authoritative sites that aren’t very popular, and many popular sites that aren’t very authoritative. For example, wikipedia is a very popular site, but while I might start performing research there on many subjects, I’m usually going to follow that up with research at much more authoritative sites.

  17. Very interesting thoughts, maybe this is one of the reasons why Google started Google+, to get more information about what people like and share…

  18. Hi Tommy,

    Google has had some varying degrees of success and failure with social networks and social sharing, such as Orkut. Google Plus seems to have been created in part with some kind of ability to use reputation scores and authority ranking in mind, and a way for Google to intergrate that kind of social networking with other services from Google such as search. Google doesn’t have control or access to a lot of information that might be tied to tweets at Twitter or Status updates at Facebook, so It is quite possible that this is one of the motivations behind their starting Google Plus.

  19. Wikipedia is a perfect example of how you shouldn’t consider it an encyclopedia, where they have obviously combined and and tricked more than a few individuals. Many of my students still do a superficial search and consider Wiki to be the definitive answer. It’s the McDonald’s of information sites; quick, cheap, and filled with junk…but we continue to visit it from time to time.

  20. @Josh: Yet, the big G really likes Wiki and always ranks their pages on page 1 for most search queries. Hell, the big G even delivers 64 Million visitors each month to Wiki according to one of the articles I read recently. What does that tell us then?

    Bill, I’ve been an avid follower of your site for quite a while but never really commented on any of your posts but today I just wanted to say: keep being awesome! Your posts are always inspiring and make me want to learn more.. :)

  21. Hi Josh,

    Wikipedia really doesn’t like “independent research” or opinions, even if the people posting that are actually experts on the topic.

    My favorite example of that is when Jaron Lanier tried to edit a wikipedia page about himself that indicated he was a film maker because he had once created a short experimental film some years before. He’s a noted inventor, author, teacher, and many other things, but doesn’t consider himself a film maker. When he went to edit the wikipedia page about himself, the editors at wikipedia became somewhat hostile. See: http://edge.org/3rd_culture/lanier06/lanier06_index.html

  22. Hi Ragil,

    Thanks for your kind words, and for deciding to comment. I’m really happy to hear that you’ve gotten so much out of my posts here.

    Interesting thoughts on Wikipedia and authority.

    Wikipedia has implemented some really smart approaches to SEO on their site, which accounts for a lot of their success on Google. The fact that the site is user generated, and a lot of people have participated in it means that there are a lot of folks who have some interest in its success and link liberally to it. It’s also a convenient site for people to link to on many topics as well, and that’s part of its popularity in search results.

    I suspect that if services like Google Plus can help Google incorporate more “authority” into the signals they use to rank pages that we will see more sites that are more authoritative start rising about wikipedia. I can hope, can’t I? :)

  23. Does Google consider social sharings as well to measure a site’s popularity? What about Facebook statuses then? Most of the people set their profiles as public, so no search engine can really know what the user has posted on their Facebook.

    And for the page rank metrics, which one does Google value more: the number of times a page has been linked or the quality(page rank) of pages that link to a webpage.

  24. Hi Sashwat,

    There are a lot of signs that Google is experimenting with the use of social signals to measure a site’s popularity. When you are logged into Google, you can see the impact of when you +1 a page on a site for instance, because you’ll end up seeing that page higher in search results for queries that it appears for.

    We have less of an idea about how and whether Google is using those types of social signals to rank pages independently of the personalization that impacts you when you are signed in. Do social signals such as a +1 impact the rankings that everyone sees, even if they aren’t signed into their Google Accounts? It’s possible that they may at this point, and there’s a good chance that they will somehow in the future, though it’s likely that Google is doing a lot of experimentation at this point regarding how exactly they might use those signals.

    Regarding Google using signals from Google Plus over signals from Facebook or Twitter, it’s possible that Google is looking at all the social signals that it has access to, but may give different weights for different signals from each service, and for different users as well. Google has more access to information about users of Google Plus since it can track information about those users, such as their activities on Google Plus, on YouTube, on Blogger, and so on. It can see whether or not they’ve set up authorship markup to tie content they create on their own sites to their Google Profiles as well.

    Under a PageRank system, Google has always valued the quality of a link over quantities of links. For instance, a single link from the front page of the New York Times could be worth the same as many thousands of links from fairly new blogspot blogs. For some kind of “social” rank, it’s possible that a mention or link or citation from someone who has a fairly high “authority” or user rank might be worth as much as many mentions or links or citations from people who don’t have much authority at this point.

  25. I was going to mention Wiki too. The results are swayed towards this site. Is it the best place for information or are there better sources out there. Google don’t seem to think so as this is one thing I have read over and over, which is Wiki always appearing at the top.

    Regardless of anything. I think you have to ask yourself a question. Are you happy with the results that google returns and the reasons it is returning it. In my niche, I have to say the results are pretty poor especially when you get to 2nd and 3rd pages of the results.

    Everything can be swayed. If I somehow get great links from high Authority sites and my site is half decent, does that mean I should be high up the SERPS

    Even with Social Signals. These can be bought and manipulated.

    I’ve often thought about how I would give rank to a site. I’ll come back to you that.

  26. Hi Darius,

    Wikipedia (I believe that’s what you are referring to) has a very large number of links to it, follows some very good practices in terms of linking between Wikipedia entries, and some very smart classification approaches. They use templates in very useful ways for search engines that like to extract structured information from templates. There are a lot of very good SEO practices that can be learned from Wikipedia.

    Chances are that Wikipedia isn’t the best source of information for someone who wants an experts opinion on a subject. It’s not a bad starting point when you want to learn about something, but as a group edited site that discourages expertise, research, and unique opinions, it shouldn’t be the ending point of any research on any topic.

    Sometimes the failure to find relevant information on the Web isn’t the failure of a search engine, but rather a failing of the Web. There are many sources of good information that follow very bad practices in presenting that information within the framework of the web, such as navigation that requires java script or cookies to navigate, which search engines often can’t follow. There are many sources of information that require subscriptions and/or logins to access. There are repositories of data that can only be accessed through searches on forms.

    This post describes an approach to ranking pages that looks at different signals of authority, and relies less upon signals dependent upon popularity (such as links). Search engines are constantly evaluating different ranking signals and approaches to finding information, and improving the quality of their results. They understand that there are people who will work to take advantage of those signals as well, in attempts to manipulate them for their own personal gain.

  27. This entire setup frustrates me. Often Google’s SERPs will display pages from sites such as eBay and Amazon, even though they aren’t relevant. This only happens because these sites have a high authority. I wish Google would re-evaluate what it deems relevant.

  28. Hi Marissa,

    I think relevance still plays a pretty strong role in how pages are ranked. What these patents are aiming at is providing a wider range of signals to judge the authority of the content on a page than just popularity, or the PageRank of pages based upon links to a site. Google is also attempting to broaden the signals it looks at to avoid the kinds of problems that you’ve pointed out.

  29. I think “popular” should be assessed on how people got there, which was @John’s point about them going hand in hand. If 90% of folks click on link #1 in Google that doesn’t make it popular. It might be crap. But if its #1 in Google it had to get there somehow. That being said, maybe a weight of Google-referred documents is ideal determining popularity. If I make it to page 3 and click on a link it probably has a good title or description, which would seem to increase its weight more than just blindly clicking link #1.

  30. Hi Thomas,

    The point of the Microsoft patent is to look for signals that indicate some level of authority that provides an independent set of ranking signals that have nothing to do with things like who linked to whom. PageRank, as a ranking signal doesn’t look at signals like those, and might lead a page to rank very highly if that page as sufficient links from highly ranked pages. There’s an assumption of quality or authority based upon that popularity, but it’s just an assumption. What the Microsoft approach attempts to do is to prevent giving too much weight to an assumption like that.

    Click throughs in search results is also often described by search engineers as a popularity metric, though I agree with you that the quality of a title and meta description is something that should be considered. Still, pages at the top of search results on the front page of those results tend to get a lot of clicks based upon another assumption on the part of searchers that if a page ranks well, it must be a good page.

    The Microsoft patent says, lets look at some other signals, including some that might be offline signals, or signals that might not be available on the page itself that might give us an idea of how “authoritative” the author of some content might be. I’m not saying that’s a perfect approach, but it is something for us to think about, and wonder if Microsoft and other search engines might adopt something like it. Google Plus and authorship markup associated with content creates the possibility that Google could start looking at more signals like that, such as where someone works, what types of degrees they might hold, where else they’ve been published, and more.

    The Microsoft patent does also include some “authority” signals that could be considered popularity signals as well, such as the number of followers in a social network, so that indicates that they see some level of popularity imparting authority, as well.

  31. I hope I can explain me meanings to authority and popularity:
    I know many web designer from Germany, so I create a circle, title it “WebDesign” and follow them.

    But this days I unfollow most of them: because I’ve read “cat content, information about a new party, but absolutly no content about web-design, simply nothing since july 2011.

    They’ve got much +1 and have a great popularity, but they are no longer have web-design authority for me.

    So I can’t understand that the level of popularity imparting authority. This doesn’t make sense for me.

    Or I missunderstood something?

  32. Hi Monika,

    In this particular instance, we’re working with Microsoft’s definitions of authority and popularity, and in the patents I’ve written about it seems that they are focusing upon what they call “source authority,” as in trying to make sure that when some content comes from a particular source, there are signals that we can look at to try to determine if they have the authority to discuss the topics that they do. That’s why they discuss looking at things like educational degrees, citations in peer-reviewed publications, patents granted, and so on.

    Systems like PageRank are more of a popularity signal since they focus primarily upon the quantity and quality of links pointed to a page. Looking at things like how many people click on a particular result in search results is also more of a popularity signal than an authority one since the numbers of clicks are more important than who it is who might be clicking.

    But things like Google’s authorship markup starts connecting social sharing and endorsements and so on with people who have Google Profiles, information about themselves such as where they are employed and where they went to school and so on in those profiles, and their interactions with other can be looked at to see if they have some expertise or authority in different subjects. The “who” that shares something or endorses it becomes important that way – it’s a move towards ranking things based more on authority and less on popularity.

Comments are closed.