Do Search Engines Use Social Media to Discover New Topics?

A new patent filing from Yahoo raises the question, “How much has social media influenced the expectations of searchers, and forced search engines to change?”

Before I can begin to even think about that, I have to ask if looking at Yahoo patents even a good idea after their 2009 deal with Microsoft to have Bing power their search results.

The Yahoo patent application was filed after the agreement between Yahoo and Microsoft, and was published last week. Are Yahoo patents are still worth spending time with? After reading through the Yahoo patent application about how the search engine might use information from social media platforms to discover recently hot topics and webpages that are relevant to those topics, I would say that they are. The terms of the agreement between Yahoo and Bing includes a 10 year exclusive right for Microsoft to use search technologies developed by Yahoo, and doesn’t stop Yahoo from applying those technologies itself.

The patent filing explores “recency-sensitive” queries, where searchers are looking for resources that are both topically relevant as well as fresh, such as novel information about an earthquake. If you’ve been watching twitter streams, Facebook updates, and other social media, you’ve seen that sometimes these sources are the best and fastest places on the Web to find that kind of information.

It’s possible that a search engine that ignores sources like those isn’t going to be able to return any relevant results for those types of queries – what the patent’s inventors call a “zero recall” problem.

Whether it’s Charles Barkley first announcing his retirement on Twitter, or news of an earthquake traveling across the social network faster than its shockwaves might spread across the firmament, or another event that someone eyewitnessed and reported upon before the media had time to file a story to be edited and published, social media has increased the speed with with news travels across the globe. These time sensitive stories are forcing the search engines to look to social media to find information that people are interested in hearing about, as close to the time they happen as possible.

The freshness of content found on the Web is going to be influenced by crawl policies imposed upon a web crawler, so Googlebot or Yahoo’s slurp or bingbot may visit a particular page and then not return for a while, based upon their specific schedules for crawling a site. That means that sometimes breaking news from a resource that might report on it isn’t always going to make it to a search engine that relies upon crawling as quickly as stories might be reported upon by sites that might write about them.

An even if a search engine were to be able to capture such fresh information, the ranking signals that most search engines use tend to be based upon features that relate to “long-term popularity and usage that can be used for ranking such as in-link statistics, Web page rank, click-based statistics, or the like.”

This patent application introduces a crawler that monitors microblog data streams which include things like tweets and status updates to discover and index fresh content that might uncover information to be used in response to recency sensitive queries.

Last year, I wrote about a Yahoo patent that described how they might decide which pages to use as “seed sites” to start webcrawls with to identify quality pages, in the post: What Makes a Good Seed Site for Search Engine Web Crawls?, and there was a lot of discussion in the comments about the potential value of looking at social media sites as starting points for crawls. The biggest value in using those does appear to be in finding very up-to-date content, and using a data stream directly from those sources means less work in actually crawling those pages.

The patent filing is:

Ranking of Search Results based on Microblog data
Inventws by Anlei Dongm, Pranam Kolari, Ruiqiang Zhang, Jing Bai, Yi Chang, Zhaohui Zheng
Assigneed to YAHOO! INC.
US Patent Application 20110246457
Published October 6, 2011
Filed: March 30, 2010

Abstract

An information retrieval system is described herein that monitors a microblog data stream that includes microblog posts to discover and index fresh resources for searching by a search engine. The information retrieval system also uses data from the microblog data stream as well as data obtained from a microblog subscription system to compute novel and effective features for ranking fresh resources which would otherwise have impoverished representations.

An embodiment of the present invention advantageously enables a search engine to produce a fresher set of resources and to rank such resources for both relevancy and freshness in a more accurate manner.

The patent points at the following as reasons to look at microblogging information to help respond to recency sensitive queries:

(1) Microblog posts are likely to contain URLs of important documents that have not yet been indexed by a search engine via conventional Web crawling;

(2) Documents linked to from a microblog post may be relevant to recency-sensitive queries;

(3) Text found in microblog posts can be used to expand the text used to find resources involved in such a query; and

(4) Other aspects of a social network can be used to rank search results.

There’s been some discussion on the Web about how search engines like Google might or might not use social signals like tweets to rank pages found through those types of resources. This patent filing points at how those resources might be discovered. Rather than presenting the tweets or status updates themselves to searchers, the aim of the process described in the patent is at finding pages through microblog posts to present to searchers.

The patent does provide a fair amount of discussion about how those discovered URLs might be ranked as well, and describes using a number of the approaches that have been developed by Microsoft in ranking pages. This tie-in with Microsoft approaches is one of the things that leads me to the conclusion that Yahoo is working to develop ways to find and rank content that will fit in with Microsoft’s indexing of content.

Things like tweets or status updates might be ranked as sources of information based upon a combination of textual features related to them as well as social networking features associated with people who provided the posts.

Some of those social networking features may include things like the number of followers the poster has, the number of posts they have made, the average number of responses they may have received in response to their posts, number of people who shared or retweeted posts, and others.

Share

32 thoughts on “Do Search Engines Use Social Media to Discover New Topics?”

  1. I guess Microsoft already gets the data from FB. And Google from G+. But the issue with reporting breaking news from tweets/ micro-blog posts is – How is their authenticity verified. If such information is found on blogs, the blog authority can always be used. But with social media networks, its going to be difficult to establish a criterion to filter out say fun loving authors who want to garner some attention.

  2. Hi Bill,

    I have to be honest a say that these types of patents really excite me and at the same time scare me. I think all SEOs will agree that we still see paid links working and easily manipulating the SERPs without penalty, so I imagine paid shares will be able to do the same thing.

    My take away from this is more to do with ranking fresh content and things that could effect a QDF query and not necessarily a long standing (timeless) web page.

    Interesting stuff as always.

  3. I agree that it is important for status updates and tweets to be indexed quickly because many URLs of important resources are shared in real time and conventional crawlers may not find these resources as quickly. I think you are correct in assuming that the social networking features should take into account number of followers, number of posts and number of likes or retweets.

    This definitely answers the question of how social media has influenced traditional crawling, at least for Yahoo.

  4. Very true. I just follow the Twitter feeds for my fave sites and when they have something new on the topics I’m most interested in, it’s on Twitter that I get the news first. I only have to click on the links. Anyway, wasn’t this already addressed by Google when it updated its search engine in response to how tweets and other real-time news sources was changing the web back in 2009? Wasn’t this all what Caffeine was capable of doing, IIRC?

  5. Interesting analysis of a trend that seems likely to become more important as time goes on. We’ve all seen how social media has changed people’s behavior when breaking news is happening — I “watched” the devastating earthquakes in New Zealand last year unfold on Twitter, because the U.S. media didn’t spend (what I felt was) enough time on them, and I’ve done the same thing with other topics that interested me. If I do it, how many others are, too? No wonder the search engines are taking notice!

  6. Your question about whether looking at Yahoo patents was still a good idea brought back a recollection of how in 2010 Shashi Seth protested that Yahoo was not getting out of search in a post on the Yahoo Search Blog. Just went back to check out the Yahoo Search Blog. Only a single post since September 27th (and it is a “must read” about Halloween related search trends). Hmmmm

  7. Hi Raj,

    With the Yahoo/Microsoft agreement in place, Microsoft would have access to the technology described in this Yahoo patent as well.

    What this describes isn’t reporting upon the tweets or status reports themselves, but rather using those to find news from reliable or authentic resources that might not have been linked to on other sites on the Web yet, and so wouldn’t be found quickly by a web crawler.

  8. Hi Ross,

    I usually like this type of patent a lot because it describes a very reasonable process that the search engines might take, and it’s from the search engines themselves.

    I see some sites getting away with paid links, while others get penalized or possibly disappear once the paid links are discovered. We’ve had some very high profile examples of that over the past year.

    I don’t believe that Yahoo uses the term QDF, but the basic concept is probably very similar – for topics that people are suddenly paying a lot of attention to, such as those that might be found in trending topics at social media sites, it really can benefit a search engine to uncover relevant results for those and overcome the “zero recall” problem.

  9. Hi jhian

    This post has nothing to do with paid links. Since the search engines all tell us to avoid paying for links for the purpose of attempting to manipulate search results, it’s possibly not the wisest of choices under any SEO strategy that’s looking for long term value.

  10. Hi Mike,

    I’m not sure that it answers all the questions that we might come up with, but it does provide possibilities for a good number of them.

  11. Hi Perry

    Twitter is a pretty good way of finding very recent topics, and I’m sure that the search engines would love to be able to use it and other social networks to not only uncover those topics, but help them find resources about those topics as well.

    Google did offer a real time search that would include things like tweets, but it wasn’t necessarily using twitter to uncover URLs or webpages that might provide more information.

    Google’s recent acquisition of Wowd, which looks at actual clicks on links from people who have the Wowd application (or something similar) on their browsers, which I wrote about in Wow! Google Acquires Wowd Search Patents would cut out the social media middle man, and show which topics and URLs people have been clicking upon more frequently on a real time level, and help Google discover resources foir “recency-sensitive” queries as well.

  12. Hi Mrs_H,

    I sat at my desk recently, and tweeted about an earthquake I was experiencing as it was happening. Upon reflection I might have been better off fleeing to outside, but I didn’t.

    It makes a lot of sense for the search engines to try to capture this kind of information as quickly as possible. As I mentioned in the start of this post, I think social media has changed the expectations of searchers, and if I read about something happening in Twitter, I’m going to expect to be able to find out more in Google afterwards.

  13. Hi Randy,

    Same that the Yahoo search blog has slowed to a trickle with new news. I do know that a lot of the search engineers who were at Yahoo are now at Microsoft (with a few exceptions who seem to have found homes at Google). But Yahoo still has a number of smart and talented search engineers in their ranks, and many of the patents that I’ve run across from them are as good as anything I’ve seen from Google and Microsoft. It would be a shame to see what they’ve developed go to waste.

  14. Interesting read Bill, this is what has been boggling my mind for quite some time. I have never quite understood the role of social media in SEO, that how do the search engines evaluate the relevancy of pages based on the number of followers, number of shares and retweets, when all these sites are nofollow.

    What I feel is that social media must have a whole ranking metric for itself. But what I want to know is that how much important are these social media sites to the search engines? How do the search engines decide over two pages, one which is involved in social media and with the one which isn’t. How does it compare them?

  15. Hi Bill.
    My web designer has been teaching me a little about SEO for my new kitchen website. I can’t afford to hire a pro just yet. My designer gave me your link to help me. I’m finding some of it a bit out of my league but I wanted to just say a quick thank you for freely providing so much information. I wondered if you or any of your readers could point me towards a beginners resource that provides good accurate info. I know it’s a bit cheecky of me but I find a lot of conflicting information on the subject. Just want to make sure that I get it right.

  16. Hi Akash,

    The “nofollow” on links in social networks are pretty much to stop people from posting spammy links to those services in the hope that they will directly impact the rankings of those pages. Of course, it’s unlikely that those links would have much long term value based upon something like a PageRank model anyway since they often aren’t linked to directly.

    There are other ways to rank social media; some of it may rely upon the text used, the novelty of those “microblog” posts, and some of it may rely upon some kind of reputation score for the person who posted it. There have been some places (patents, whitepapers, presentations) where Google has written about a “user rank” which may be in use in Google’s Confucius Q&A service, and possibly even part of Google Plus at this point, which involves a score for posts based upon the posters contributions to the service as well as their meaningful interactions with others in ratings and responses.

    The value of social media posts are often in their ability to provide recent and timely information, rather than non-social media pages that act as informational resources. So a social media microblog post might be ranked in part based upon things like freshness while a non-social media page might be ranked based-in-part upon things like links to that page.

  17. Hi Darren,

    Thank your designer for the reference to my site.

    There are resources out there that you might find helpful that I would recommend. To begin with, it doesn’t hurt to start looking through all of the help pages and resources that the search engines provide, as well as the help forums. The webmaster help forums are at:

    http://www.google.com/support/forum/p/Webmasters?hl=en

    It’s not a bad idea to spend a little time there everyday, and see what kinds of problems other webmasters have, and what kind of responses they receive from the people who post there. While some of those might not apply to what you’re doing on the Web, you may find many that do.

    I’d also recommend keeping an eye on the blogs from Google as well. Here are links to two of the main ones:

    http://googleblog.blogspot.com/
    http://googlewebmastercentral.blogspot.com/

    Since you’re just starting out with SEO, I’d highly recommend Google’s SEO Starter Guide (pdf) as well:

    http://static.googleusercontent.com/external_content/untrusted_dlcp/www.google.com/en/us/webmasters/docs/search-engine-optimization-starter-guide.pdf

    There’s a lot more than that to SEO, but it’s a pretty good place to begin and learn some of the basics.

  18. “How much has social media influenced the expectations of searchers, and forced search engines to change?”
    I think that by the moment not much, but in a few years Social Media will be one of the most important keys to do good Search Engine Optimization.

  19. I believe the tweets, likes, shares, diggs, stumbles definitely work on the positive side and thats what we call social media marketing. its simple you make more noise you get noticed thats thats how you grab googles attention too.

  20. Hi Mike,

    I do think that social media has increased the expectations of at least a decent percentage of the population in finding information about things that have happened very recently, that there may not be much online about yet.

    I do agree with you that its role will increase tremendously in the future.

  21. Hi David,

    I think there’s also something to be said for the quality of what you present via social media in addition to the volume that has an impact. :)

  22. Hi Bill,

    Thanks for the insights, always love your response. I get so astonished that how convincing your thoughts and advices are.

  23. Thanks, Akash.

    A large part of the fun and enjoyment I get out of blogging are the comments that I receive and the chance to respond.

    I don’t like it when I spend a lot of time writing a comment somewhere else and no one responds to it, but I have no control over that. I do have control over responses here though. :)

  24. very nice collection of links Bill! Very good email draft to get rid but also help those clients that don’t want to pay – they’ll be back after one year of trying lol

  25. Hi Ron,

    I’m guessing that you’re referring to this comment above in response to Darren.

    I’m more than happy to point people towards resources that are aimed at beginners, and I don’t think that it hurts webmasters and site owners to learn as much SEO as they can. :)

  26. Right now I still think that Google has to treat Social Media ranking signals on a separate “plane” than the “regular” signals as so many sites and vital pieces of information in the web are not shared in SM at all.
    Example: There are 2 websites of a doctor – one blogs and tweets about it, the other just writes pages (no RSS, no Tweets). So if those 2 sites have a similar backlink profile and content (private practice type of thing) why should Google put the social one at the top?
    The solution is to put certain phrases, businesses, websites and so forth into the social “plane” and all the others in the classic plane (where sharing is NOT caring) ;-)

    Also Google has to implement a klout-like score (is klout Google’s next buy?) to exclude bought shares which are usually done by bots/fake accounts that follow certain patterns.

  27. Hi Andreas,

    We have had statements through the course of this year, and especially when Google stopped showing real time search that they would be heavily experimenting with how to incorporate social signals into Web search.

    In your example, one doctor participates in social activities and the other doesn’t, but most other aspects of their sites are very similar.

    Google has been experimenting with a user rank in the past that has been developing with their confucius Q&A service and was worked upon more for their foray into an open social graph that could be used with Google Plus, that is in many ways much more sophisticated than Klout. As Eric Schmidt has noted a few times in public over this past year, Google Plus is an identity service. With things like authorship markup, and author profile pictures showing up in search results, we can see and learn more about the authors of pages and articles and blog posts on the Web, and that activity – contributions to social networks and meaningful interactions with others on those networks can be used in a score that shows off expertise in different areas and authenticates that content.

    All things considered, if the owner of a website participates in social networks in meaningful ways, connects his website to a social network like Google Plus, and exhibits that he is an expert, he is authentic, and he is an authority, that may impact web rankings for his website in a positive way, and could cause his pages to rank ahead of the doctor who has a very similar website.

    When Google showed real time results based upon social media activity and social results that let people see within web results the activities of people they were connected to online, that was a separate plane than the “regular” signals we are used to. Google has now started to show things like author profile pictures and numbers of circles people are in to anyone, regardless of whether those searchers are on Google Plus or not, and chances are that those can or will be associated with ranking signals that impact placement within Web search results.

Comments are closed.