Added: This patent application was granted at the USPTO on 2/34/2015 – Ranking User Generated Web Content
One of the challenges that face search engines is how to rank content found on sites that rely upon users to create that content, often referred to as User Generated Content or UGC. Towards the end of 2009, I wrote a post about a Yahoo patent that described some of the things they might consider looking at when ranking UGC, in the post How Search Engines May Rank User Generated Content.
With Google’s recent launch of Google Plus, I’m anticipating posts and comments from their new social network system to start appearing in Google Web search results sometime soon.
A Google patent application published this past May at the World Intellectual Property Organization (WIPO) describes possible signals that Google might consider in its Web search results when it displays and ranks images and videos on photo and video sharing sites, questions and answers on Q&A sites, forum posts and responses, blog posts and comments, and social network posts, status updates, and comments. It was originally filed on October 29, 2009, but looks like it could be a system that could be used with Google + without too many modifications. The patent filing hasn’t been published yet at the US Patent and Trademark Office.
The system behind the patent filing would work with sites where people are required to log in using some kind of identification information before adding content, and where people can interact with each other by posting questions or other content, and people can respond. The interactions between people could be weighed, and credential scores for each user would be generated based upon those interactions. Quality factors could be weighed for comments or answers to posts and questions and other user-generated content. Interactions can involve submitting a post or comment, uploading content such as images or videos, rating something, or even viewing something that someone else has submitted.
Credential Scores for Users
The posts and questions and comments might show up in Google’s search results based upon how relevant they are to a query, and credential scores for the people who created them. These credential scores are a type of author rank, which would be based upon a combination of an authority score and a contributiveness score based upon interactions and weighting factors associated with interactions with others on a social network.
An Authority score would be based upon an analysis of the quality of responses that someone makes on a social networking site and the contributiveness scores of the people who posted that content.
Contributiveness scores would be based upon the quality of something that you post or upload to a social network and the authority scores of people who respond to that content.
There’s no telling if Google is using the processes from this patent filing at this point or if they will in the future, but it’s possible that they may use something similar. If this were to be applied to Google Circles, when you make a post, your contributiveness score would be based upon the authority scores of people who respond to that post. When you respond to a post, your authority score would be influenced by the quality of responses that you provide and the contributiveness of the people who originally posted the content that you are responding to.
We’re also told in the patent application that user credential scores and rankings based upon those scores might be different for different categories or labels associated with user generated content. Someone commenting or posting on gardening and also on SEO might have one credential score for gardening and a different one for SEO.
So where might Google get those categories or labels? In a forum setting, that might be the name of the particular forum you are posting in, such as an “internet marketing” section of a webmaster forum. If you’re commenting on a blog post, it might be the category or tags used on the original blog post.
The weights between each relationship or link between two members of a network might be based upon:
- How relevant a response or comment might be from the first person to something that the second person posted,
- How original a post or comment or piece of content submitted to the network might be compared to other content items,
- How much “coverage” or broadening of a topic a piece of content might add to the network, based upon a measure of uncommon terms in the post or comment or reply,
- How “rich” the content item might be, (Does it include multimedia or rich media content) or
- The timeliness of a content item, such as a quick comment in response to a post, or a fast answer to a question.
A system like this might be used to:
- Show posts or comments in search results,
- Reward users for high quality input, or
- Restrict access based on low quality contributions (and possibly consider them to be spammers).
This system might also be used to personalize search results by looking at the relationship strength between different users of one or more social networks and possibly boost relevant results in those search results based upon that relationship strength. So, if there’s someone whom you have quality interactions with on a social network on a regular basis, and you search for something in Google that they’ve written something relevant about, their result may appear higher than it otherwise would have because of those interactions.
The patent application is:
Ranking User Generated Web Content
Invented by: Xiance Si, Jian Gong Deng, Huacheng Ke, Dong Zhang, Zoltan I. Gyongyi, and Edward Y. Chang
Publication Number WO/2011/050495
Publication Date: May 5, 2011
International Filing Date: October 29, 2009
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for analyzing quality of user-generated content involve identifying interactions between users through an electronic network and assigning a weighting factor to each interaction representing a quality of the interaction. A user credential score is generated for each user based on the weighting factors for each interaction. The user credential scores are stored in association with a user identifier on a computer-readable storage device.
Quality Values for Posts and Responses
The patent filing provides some examples of things that might be considered when the search engine analzes a post or content item to create a quality value.
For example, on an online discussion forum, an initial question or post would be analyzed to determine:
- Its relevance to the forum topic,
- Appropriateness of language (e.g., lack of profanity), and/or
- Originality in relation to previously-posted questions.
If someone were to respond to the original post, the quality of their response might include looking at:
- Its relevance to the question,
- Appropriateness of language used (e.g., lack of profanity)
- Specificity of response,
- Originality in relation to previously-posted responses, or
- Promptness in relation to the timestamp of the original posting of the question.
The quality of those posts and responses to them would be used to determine a quality of the interactions between people interacting, and if the participants were to be included in a user activity social graph, the strengths of the relationships or links between those individuals would be based upon the quality of the content that they’ve posted and responded to.
Some examples of how interactions can impact someone’s credential score:
- Someone responds to a high quality question with a high quality answer, their interaction may positively impact their credential score.
- Someone responds to a question with a low quality answer, their interaction may negatively impact their credential score.
- If someone responds to a question posted by someone with a high credential score, their interaction may more positively impact their credential score than if they respond to someone with a low credential score.
- If some posts a question, and they receive high quality responses from people with high credential scores, that interaction can positively impact the original posters credential score.
I could very easily see a Q&A site like Quora, or any number of forums using this system from the patent without many changes at all, and it would likely work well with Google + as well. It’s even possible that some aspects of it may be in use to display content from sites like Quora.
Chances are that Google may use something very similar in the future to help decide what posts or comments to display from Google Plus.