Microsoft Weighs in on Ranking Authors in Social Networks

Author Ranking in social media is more than just a popularity contest, and can include things like how frequently an author surfaces content that subsequently becomes popular, topical authority on different subjects, and popularity and influence signals.

Author Authority to Distinguish Signal From Noise?

Social media contains a lot of signal, and a lot of what might be considered noise. Within social streams of real time communication such as tweets and status updates and blog posts is information that can be invaluable on many different topics.

How does a search engine pick out which authors are actual authorities on different topics, and which are sharing and resending and adding to authoritative content? How does it tell which authors are piggybacking off such content, and which authors just really aren’t authorities on any given topic?

Some authors aren’t even real people, but instead exist as spam and/or aggregator accounts, adding little or no value to other members of a social network.

A patent application from Microsoft tells us that the value among social streams hinges on finding the most authoritative users on given topics.

Identifying Social Authority

So what exactly is social authority, and how is it developed, identified, and measured? How does someone establish themselves as an expert in a given field?

Social engagement signals may be used to rank user content based upon social graph metrics like number of followers and the number of times a user’s content has been shared, but those signals can be prone to simple spamming, or dominated by celebrities like Lady Gaga, or Justin Bieber.

The Microsoft patent filing describes a way of determining an Author Rank in social media systems that looks to a variety of statistical approaches applying usage metrics and social and topical graph characteristics, such as the following:

  • A temporal analysis of link sharing where authority is based on a user’s propensity to link early to Web pages that become popular
  • A topical authority based on an author’s links and content updates in specific topic areas
  • A popularity and influence measure based on properties of the authors such as:
    • Number of followers
    • Number of posts such as microblogs resent
    • Mention counts in which an author is mentioned
    • Number of on-line friends an author has

    The pending patent is:

    Ranking Authors in Social Media Systems
    Invented by Peter Richard Bailey, Chad Carson;, Scott Joseph Counts, Nikhil Bharat Dandekar, Ho John Lee, Shubha Umesh Nabar, Aditya Pal, Michael Ching, Paul Alexander Dow, Shuang Guo, and Seo Hyun-Ju
    Assigned to Microsoft
    US Patent Application 20120117059
    Published May 10, 2012
    Filed November 9, 2010

    Abstract

    The author ranking technique described herein is a technique to rank authors in social media systems along various dimensions, using a variety of statistical methods for utilizing those dimensions.
    More particularly, the technique ranks authors in social media systems through a combination of statistical techniques that leverage usage metrics, and social and topical graph characteristics. In various exemplary embodiments, the technique can rank author authority by the following:

    • 1) temporal analysis of link sharing in which authority is computed based on a user’s propensity to provide early links to web pages that subsequently become popular;
    • 2) topical authority based on the author’s links and content updates in specific topic areas; and
    • 3) popularity and influence based on nodal properties of authors.

    Everyone’s an Authority, and No One Is?

    In social media settings, we are both the producers of content, and its consumers. We can send out our thoughts and ideas, conversation openers and responses, mentions and questions to specific individuals, or to people we are connected to, or even to everyone at once. We can re share the content others send out as it is, or transformed in some manner.

    With many people participating in social media, there’s often a great diversity on many topics. How does a search engine, interested in re sharing some of this content in a set of social search results decide what to show and what to leave out? How does it find true authorities, both interesting and authoritative on a topic?

    Identifying true authorities can be tricky because of the presence of highly visible overly general authorities who might have many followers, such as popular mainstream sources. Those may be authoritative, but may not be the source of original or expert topics.

    The patent application tells us that end users likely are looking for a mix that includes these larger organizations along with lesser known authors who may be the first to write about a particular topic, or have some special insight or analysis.

    Those lesser known authors might be highly authoritative, but might not be well known, and may not have high follower counts, or may not even have produced a lot of content on a particular topic.

    We’re also told that algorithms similar to PageRank algorithm over a social graph of users are likely to be more sensitive to celebrities and insufficient to find true authorities.

    PageRank also just isn’t useful in queries that involve recency-sensitive queries on bursty topics that arise too quickly to develop links or citations that take time to be acquired.

    Propensity for Sharing Early Links

    Instead, one signal of authority can be based on a propensity to provide early links to Web sites or Web content that becomes popular with other users.

    Authors who display such a propensity can be identified and ordered in a rank list, which could be used to rank or re-rank search results to take into account author authority or for filtering search results to exclude spammers.

    Topical Authority

    Here’s how a search engine might determining Topical Authority.

    It might begin by searching over a body of social media updates on microblog or blog posts containing keywords associated with a given topic, such as one associated with a query input into a search engine and this topic may be expanded to include related topics.

    A raw feature extraction may be performed from the data returned in response to the query and any associated author/user data. A number of features about the authors and posted data resulting from the query may be examined, including:

    • Raw count of topical posts
    • How often an author is cited by other authors
    • How often an author cites themselves
    • Number of times an author is replied to
    • Total number of posts authored in the system
    • How often they are mentioned by other users
    • Number of links an author has shared
    • How often they use explicitly denoted keywords (e.g., hash tags)
    • A similarity index that computes how similar an author’s recent content is to previous content
    • A timestamp of an author’s first post on the topic
    • A timestamp of their most recent post on the topic
    • A count of friends/followers who also post on the topic
    • A count of an author’s social media friends/followers who posted on the topic before the author in question posted on the topic
    • A count of an author’s social media friends/followers who posted on the topic after the author in question posted on the topic.
    • Other signals could also be used.

    The number of users to be considered an authority on a specific topic would be pruned if they fall below a certain threshold on these topical signals.

    The remaining users would then be clustered into two groups –authorities and non-authorities, again based upon those signals.

    At that point, the authors within the cluster of authorities are ranked in order based upon their scores for the features.

    User Metric Signals

    While the description within the patent application focuses upon Twitter and tweets, it makes it clear that a system like this could be applicaed to other forms of social media, including status updates and blog posts.

    We’re shown how user metrics might identified, categorized, and explored on Twitter, but we have to keep in mind that similar metrics might be used for Facebook and even blog posts and comments on other sites.

    As for Tweets, they might be categorized into three categories:

    – Original tweet – standing on its own
    – Conversational tweet – directed at another user (e.g., as shown by the use of an @username token before the text, or through associated meta-data).
    – Repeated tweet copied or forwarded by someone into the social network, and often started with “RT @username”.

    Metrics might also be computed about mentions of another user, but independently of those mentions in a conversational tweet or a retweet.

    The use of hashtag keywords (starting with a # symbol) might also be reviewed.

    A self-similarity score measures how much someone users words from their previous posts (on topic and off topic). A stopword list might be used to eliminated very common words first.

    Those tweets might be ordered by timestamp and examined to see how similar they are, with a couple of interesting assumptions. One is that someone using many of the same words from their previous posts might be engaged in spam behavior. In the opposite direction, some using the same words much less frequently might be posting on a much wider set of topics, or may have a very large vocabulary.

    Other signals could involve:

    – A topical signal that estimates how much an author is involved in a particular topic.

    – An originality signal that looks at whether an author tends to originate conversations on a particular topic, and how often he or she engages in conversations on that topic and replies back to others on it out of courtesy. Those responses can be helpful in finding real people who tend to be social.

    – How often a person’s posts are resent by others, dampened by the impact of some “overzealous” users who tend to retweet frequently.

    – Whether or not mentions of others are based upon merit – actual conversations and interactions with others.

    The patent filing provides a number of other signals, as well as a couple of different ways to use these signals together to come up with scores based upon them regarding different topics.

    Takeaways

    I’ve written about how Yahoo may rank user generated content and how Google might also rank social interactions on sites that rely upon user generated content as well.

    If we look at the three somewhat different approaches from each of the search engines together, I think we get a useful glimpse at some of the problems and issues that a search engine might face in trying to rank content created in social settings where the vast majority of that content is created by users, and where the search engine has an interest in indexing that content.

    How does a search engine surface the most useful and interesting content on a specific topic? How do they filter out original content from the sharing of that content?

    How do they identify fake profiles and sock puppets? How do they identify the best sources on different topics?

    All three look at the quality of original contributions to a social system, whether in the context of a forum or Q&A site or a microblogging platform. All look at interactions between members of those systems, and the meaningfulness of those interactions.

    Those three patent filings aren’t the final say from each of the search engines on how they may rank content from social sources, and include that content within social search results. But I think taken together they provide some useful guidelines on how someone might be perceived as an authority on different topics.

    Share

    24 thoughts on “Microsoft Weighs in on Ranking Authors in Social Networks”

    1. Great post. I think we should not forget personalization and it effect on social authority. I may be an influencer on a particular topic in my community but may not be an authority at all outside my ecosystem. So i think social authority vary from social circle to social circle and should not be confused with Global authority on a topic unless i am a part of a very big ecosystem where i have considerable influence. So for example if you are connected to me and not to Avinash you may see my posts at the top of your personalized search results because i may be the top authority on analytics in your ecosystem. But if you are also connected to Avinash, then Avinash will override me and claim the top social authority status and search engines will suggest his posts.

    2. Determining the “authority” of commenters or posters to decide on the quality of link is a great idea. I just looked at the link profile of a site in my niche who creates fake profiles and fake names to post junk comments on EDU and other do-follow blogs in order to rank for their keywords. You’d think that by now, Google would have found the solution to this nonsense, but the tactics definitely work, as the website they are promoting ranks #1 for the keywords they are focused on.

      It is time to devalue links posted by posters of junk and give credence to authority posters instead.

    3. Thanks for all the information. Staying on top of this has been a tremendous amount of work, but I appreciate the way you’ve broken this down. On one hand it has given me a lot of information, but on the other hand it just reconfirms how much I really need to improve at being recognized as an authority.

    4. It’s such a complicated thing to measure. And something that’s so appealing to people that would want to potentially manipulate the results for their own benefit. With the sheer volume and speed that information is pumped out online, how do you so quickly determine what’s relevant and legitimate without giving a bias towards celebrity profiles or sites?

      Great blog…I just stumbled across it today and subscribed. Keep up the good work.

    5. Hi Himanshu.

      I think we are possibly eventually going to see social signals being used for rankings within both a logged in social search and a logged out search as well. We’ll likely see the social/personalized results first in both Google and Bing before we see the impact of such signals in a logged out set of web search results.

      For those logged in results, the impact of social connections will likely be seen in a manner like you describe. I’m just not sure that we should call them “personalized” results. Does Google or Microsoft consider their social results to be part of personalization? I’m not sure that they do.

      Limiting social based results to people we make connections with isn’t necessarily the same as showing us results based upon our previous browsing and search history. I think we are impacted by both when logged into search results.

    6. Hi Joshua,

      I suspect that we’re going to see the separate development of social signals and link based signals for a while, with Google and Bing possibly trying to find other ways to diminish the value of links from junk comments. I don’t think that comments associated with specific social accounts are going to replace links that aren’t, quite yet, and they may never. But they might be an additional layer of value that could potentially influence rankings.

    7. Hi David,

      Thank you. A couple of years ago, there wasn’t much emphasis on things like reputation scores and authority signals for authors in SEO, and I’m not sure how much emphasis people are placing on those types of things quite yet. But it’s pretty clear that’s one of the directions that both Google and Bing are taking.

    8. Hi Shane,

      Thanks.

      One of the ways to avoid a bias towards celebrities and well known sites is definitely to look past simple social engagement measures like numbers of followers, and actually look at other signals such as whether or not someone has a propensity to provide a link to a particular page before a good number of other people do like this Microsoft patent filing suggests, or to look at how meaningful someone’s contributions to a social network and their interactions with others might be, like some Google patent filings suggest.

    9. Definitely the propensity for sharing links early (towards content that will then begin receiving links)is an interesting way of distinguishing an influential author especially if the content itself is really fresh.

    10. Hi Eliseo,

      It is a very interesting idea to provide a boost in reputation to an author who shares early links. I’m not so sure that I would call that person influential, because his or her sharing might not necessarily influence others to choose those. But the ability to identify and share content that becomes popular is a sign that a person is very observant on specific topics, and is displaying a knowledge of that topic.

    11. I enjoyed reading this post. I’ve seen some spammy services pop up that try and replicate the signals of authority but if you dig, and not so deep, you can tell. I’m sure it doesn’t take much for search engines to figure this out as well.

    12. Hi Stuart,

      Thanks. It is interesting seeing the ideas that both Google and Bing are coming up with to try to understand how authoritative authors might be within the topics that they post about when they participate in social networks.

      There are some services out there that try to show off social scores for people participating in social networking that just can’t do the same job a search engine might because they just don’t have the kinds of access to collateral data about those activities that the search engines do.

    13. As Joshua pointed out, many companies and agencies still use these type of fake persona tactics for promotion. At some point, I believe it’s effectiveness will diminish or become unsustainable due to the sheer amount of work required to maintain some form of credibility for “sock puppet” lol , accounts.

      For this reason, I’ve decided to just brand myself for the most part, with my real name.
      Besides working with an agency by day, I also run 4 eCommerce shops, a eCommerce consulting service and a niche blog. I can’t imagine trying to maintain 6 or 7 different “personas” on several social media sites.

      I run the risk of some posts being off topic, or even commercial in nature, however I think in time my AuthorRank will be improved via one active profile on my important social network properties, vs. sparse content across multiple personas.

      I don’t see any other way to do it.

    14. Hi Jeff,

      Thanks. The days of search engines giving value to fake personas and sock puppets are growing increasingly limited. The amount of work that it might take for an agency or individual to make a persona look real enough for a search engine to give any credence to it is decreasing as efforts like Google’s authorship project grows. That’s likely one of the reasons why Google and Bing are looking to add some element of “authorship” and reputation to search rankings.

      I don’t think there’s too much harm in “posting” off topic tweets or status updates or so on. It does look like the search engines will likely develop reputation scores for individuals along different topics, so that you might be considered an expert in ecommerce and just somewhat knowledgeable in photography or gardening or some other topic that you might write a few blog posts or Google Plus posts about. You may actually be better off having a little diversity to your real and known persona.

    15. Bill Slawski thanks for the great post and insight into specific metrics. Jeff Bronson, I agree with you that the “work required to maintain … credibility for “sock puppet”” is too great. I was attempting this myself so my message could be highly targeted to specific audiences, but agree that AuthorRank will be greater if you consolidate your social interactions into your personal brand – specifically for those who are very socially active.

    16. Interesting topic Bill. It reminds me of the Klout score that supposedly tries to rank the online influence of an individual. It’s also reminiscent of the days when one’s power ranking @ digg meant something. As others have said, the search engines need to get to the bottom of a solution for all the sock puppet / SEO generated trash content out there on the web. You cannot rely on links alone as those can be machine generated.

    17. Hi Kelly,

      You’re welcome. To build a reputation or author rank that might actually have some impact, especially for topics that a lot of people are talking about (or that might potentially have some commercial value), I would expect that it would take a lot of work, a lot of social activity and meaningful interaction.

    18. Hi jjray7,

      Thanks.

      Social scores for authors are definitely going to rely upon things that go beyond just numbers of followers or people you might have in circles, or volume of tweets or status updates. There are signals that the search engines are looking at that measure things in ways that a sock puppet might have a hard time matching.

    19. Thanks Bill. How would you quantify or measure the social activity necessary to have influence on a particular subject? Are there any case studies from which we can look towards as an example? I find the subject fascinating actually as a complicated metric for search engines but it could offer some valuable results to those searching.

    20. Hi Bill – just found your blog while looking for an answer to “how do search engines deal with UGC”. Excellent stuff.It seems to me UGC has become the heart of the web and the search engines will need to develop ways as described here to drill down and factor it into search results. The SEO industry may be about to experience their own Arab spring and I think most businesses and organisations are beginning to realize that poor quality links and manufactured reviews will be punished.This also has significance to how sites using live data feeds choose what to publish.

    Comments are closed.