Do Search Engines Love Blogs? Microsoft Explores an Algorithm to Increase PageRank for Pages Linked to by Blogs

Last December I wrote a blog post titled Do Search Engines Hate Blogs? Microsoft Explores an Algorithm to Identify Blog Pages. The inventors behind the patent filing described in that post have come out with a new patent application that says some positive things about blogs. Looking back at the original post, it appears that they may not hate blogs at all.

In the new patent document, they ask if the rankings of web pages in search results would be improved by a providing a slight increase in the PageRank of pages linked to by blogs. They tell us that:

This idea is based on the assumption (or hope) that blogs are still mostly human-authored, and that links from blogs generally represent sincere endorsements on the part of the authors.

The December post explored how a search engine might be able to identify blog pages and distinguish them from non blog pages, and told us that:

Search engines are increasingly implementing features that restrict the results for queries to be from blog pages.

But limiting the number of blogs that show up in search results doesn’t necessarily mean that a search engine doesn’t like blogs. It may mean that search engines would prefer to show a diversified set of search results, including blog pages and other results.

Ranking Algorithms

Search engines often look a couple of different kinds of ranking factors when determining the order that search results are shown to searchers.

Query-Independent and Query-Dependent

One way to classify ranking algorithms is query-dependent (or dynamic) or query-independent (or static).

Query-dependent ranking algorithms rely upon the query terms someone uses to rank pages, while query-independent look at other factors such as how important they may believe a page to be based upon things such as whether or not important pages link to that page (an example of a query-independent ranking algorithm would be PageRank).

Query-independent ranking algorithms assign a quality score to each document on the web, and can be run ahead of time. Query-dependent ranking algorithms depend upon the query used, and have to be run when a user submits a query.

Content, Usage, and Link Based Ranking Algorithms

It’s also possible to classify ranking algorithms as content-based, usage-based, and link-based.

Content-based ranking algorithms – use the words in a document to rank the document among other documents. For instance, a higher score might be assigned to a document that contains the query terms at the beginning of a document, in a prominent font, or in a certain kind of HTML element.

Usage-based ranking algorithms – may assign a score based on estimates of how often documents are viewed from looking at web proxy logs or looking at click-throughs on search engine results pages.

Link-based ranking algorithms – look at the hyperlinks between web pages to rank those pages, assigning a score to pages based upon links pointing to pages. endorsement of the page.

PageRank – an example of a query-independent link-based ranking algorithm.

The PageRank formula is often explained as follows. Consider a web surfer who is performing a random walk on the web. At every step along the walk, the surfer moves from one web page to another, using the following algorithm.

With some probability d, the surfer selects a web page uniformly at random and jumps to it; otherwise, the surfer selects one of the outgoing hyperlinks in the current page uniformly at random and follows it. Because of this metaphor, the number d is sometimes called the “jump probability,” namely the probability that the surfer will jump to a completely random page.

If the web surfer jumps with probability d and there are |V| web pages, the probability of jumping to a particular page is d/|V|. Since any page can be reached by jumping, every page is guaranteed a score of at least d/|V|. The PageRank of a particular web page is then the fraction of time that the random surfer will spend at that page.

But what if that surfer started favoring pages that were linked to by blogs a little more?

Splitting PageRank

One of the problems behind using PageRank is that some commercial web sites try to inflate PageRank by creating links that point to a page solely for the purpose of endorsing that page, artificially increasing the value of the page.

This patent filing describes in some detail how a portion of PageRank from a page might be split (or distributed) equally amongst the links found on the pages of a site, and how the distribution of PageRank could be slightly altered to favor (or show a bias towards) pages that are linked to by blogs.

If blogs are, as the authors note in the patent, “still mostly human authored, and generally represent sincere endorsements of their authors,” then this bias might help counteract the artifical inflation of PageRank scores by people who would create links pointing to pages solely for the purpose of artificially increasing the PageRank of pages.

The patent filing is:

Ranking Method using Hyperlinks in Blogs
Inventors: Steve Chien and Dennis Fetterly
Assigned to Microsoft
US Patent Application 20080243812
Published October 2, 2008
Filed March 30, 2007

Abstract

A method for static ranking of web documents is disclosed. Search engines are typically configured such that search results having a higher PageRank.RTM. score are listed first. A modified scoring technique is provided whereby the score includes a reset vector that is biased toward web pages linked to blogs. This requires identifying web pages as either blogs or non-blogs.

Identifying Blogs

Some of the kinds of things that a search engine crawling program might look at when deciding whether a page is from a blog might include:

  1. Whether a page is hosted in a known blog hosting DNS domain such as blogspot or wordpress.com
  2. What features are contained in the non-HTML markup words and phrases contained in the page
  3. What the targets of outgoing links might be in the page, and
  4. Whether the string “blog” occurs in the URL

Experimenting with a Bias Towards Pages Linked to by Blogs

The authors of this patent performed experiments where they downloaded over 472 million pages, and found links to an additional 6 Billion pages within those pages.

They reranked the PageRank of these pages using a bias towards pages that they identified were linked to by blogs, with a preference towards using blog pages that had higher PageRanks, which they tell us tend to be “frequently updated, more informational rather than personal, and free of spam.”

They also tell us that some other characteristics of blogs may prove useful in refining this technique, such as looking at the number of subscribers to a particular blog, and associating a higher endorsement value to blogs with greater numbers of subscribers.

Conclusion

Can sending more PageRank to pages that are linked to by blogs something that will increase the relevance and importance of pages that show up in search results? Are links to pages from blogs still actual endorsements from the authors of those blogs?

Do search engines love blogs?

Share

41 thoughts on “Do Search Engines Love Blogs? Microsoft Explores an Algorithm to Increase PageRank for Pages Linked to by Blogs”

  1. Towards the end you mention whether the string “blog” occurs in the URL. I wonder if that would effect the value of domains that include the term in the domain name. Also I had not given much thought as to hosts like blogspot or wordpress. Wonder if that would factor more weight than a blog hosted with a standard host.

  2. Hi Michael,

    It may be possible that including the word blog in your domain name may make a difference, especially if people link to your site using the word “blog” in the link – we can’t ignore the possibility that a search engine will look at related and relevant anchor text pointing to a page to understand more of what it is about.

    The “Do search engines hate blogs” post that I linked to at the start of the post does go into a little more detail on some of the things that a search engine might look at to determine if a page is from a blog, such as looking for words like: “permalink”, or “blogroll”, or “powered by”, or “trackback”, “comment”, or seeing if it has an RSS feed, and some other signals.

    I think that it may be a fairly strong sign that if a page is hosted at a site that typically only hosts blogs that the page is likely from a blog – which is why they do need to look at other signals, especially for pages that aren’t on one of those hosts.

  3. I have noticed for sometime now that there is a rise in traffic to the web pages that I link to from my blog, shortly after posting new entries on the blog itself.

  4. Interesting observation, Peoplefinder. :)

    I’m assuming that the traffic is in addition to any traffic that might be visiting those pages through the link from your blog. I wonder if that additional traffic might be attributed to a link from a blog, or a fresh new link pointing to the page regardless of the type of source.

  5. Hi Bill,

    Yes. The traffic to the pages linked to from the blog is usually disproportionately larger than the increased traffic to the blog itself.

    I usually notice a measurable spike for several days to the linked pages after a fresh blog post. The traffic spike then subsides if I don’t make a new post for awhile.

    Although, as we all know, correlation != ( does not equal ) causation.

    It just seems odd. I have been noticing this for awhile now. I will continue to monitor it.

  6. wait a minute,this is making no sense…or it’s just me who don’t understand…why Microsoft is messing up with Google’s PageRank ?

  7. As a former research scientist, I have to say this work is flawed, based on their statement about blogs
    “still mostly human authored, and generally represent sincere endorsements of their authors,”

    Firstly, they seem to forget that humans do not always write articles on a blog because they sincerely believe it to be true. There are entire business models that say pay someone to write nice things about you in a blog to boost your site’s sales (or earn money by writing nice things on a blog and get paid)!.

    Secondly, a blog is a web page. Just like any other web page. I choose not to have a blog – does that suddenly mean anything I write or link to from one of my web sites is now less important that if I did use blogging software.

    Thirdly, blogging software was invented, remember, for those who couldn’t design and build a web page if their life depended on it. So why on earth does that make the content of a blog better than the content on a web site hand-crafted by someone who can design and build one?

    Fourthly if, as their statement “still mostly human authored, and generally represent sincere endorsements of their authors,” implies, web pages created by non-human methods are less reliable than those created by humans, would it not be a MUCH better idea to detect those site created by automated systems which just scrape content from elsewhere, and down grade them, rather than upgrade just SOME of the human generated content (the bits that appear in blogs)

    Finally, as I said before, a blog is a web page, that’s all. Nothing special at all.

    PS For the benefit of Mat, I suspect they are using the term pagerank as a generic term for ranking pages, rather than referring to Google’s own patented PageRank method (which of course is only one of about 200 ways google scores a page to generate its search results).

  8. Dr. John,

    Even though there may be business models behind blogs, there are still more relevant than some odd page popping up no where. Think about traditional media, only reason lot of things we here and the way it is presented is because somebody is paying for it. That doesn’t mean that those news sources don’t have credibility. Rather, because they report on the latest happenings around the world, I will argue they have more credibility than an advertisement running between Super Bowl show.

    ~R

  9. Well, here is the thing; Microsoft search is in raise. The plan of offering people money for using msn search was a good idea, it boosted their usability. I have been using MS AdCenter for the last year, i can see an increase on my clicks through AdCenter.

    On the other hand, blogs are something that indicates your social interactions, and communication with virtual world. Under these circumstances, to have an additional affect on one page’s rank, every search engine should consider blogs importance.

  10. Hi Dr. John,

    I’m not sure that we can say that the work is “flawed,” but maybe rather that it might not be based upon assumptions that should be made.

    1) There are some pay for review business models that have paid some bloggers for reviews of their sites and we can question how sincere they are as endorsements.

    2) A blog is like any other web page, but I don’t believe that the authors are saying that there is anything less untrustworthy about pages that aren’t blogs – instead rather, they may find it easier to identify pages that are blogs, and that some blogs, like those with high pageranks, might be indicators of trust in the links they point to other sites.

    3) Again, the authors are looking at blogs not because they are any more trustworthy than sites that aren’t blogs, but because it might be easier to distinquish a blog page from one that isn’t a blog using an automated program or maching training tools. It has nothing to do with the ability of the person posting at a blog to be able to use HTML or program or design.

    4) It’s quite likely that many of the factors that distinquish a blog from an autogenerated splog can be identified also. There are a number of Microsoft research papers on the topic. One of the listed inventors on this patent filing wrote a few of those, including Spam, Damn Spam, and Statistics (pdf)

    The use of pagerank might be both as an illustration of a similar ranking method, or the use of pagerank itself. As I mentioned in a response to Matt, Google licenses the pagerank technology, and are not its owners.

  11. Hi Peoplefinder,

    Thanks for sharing your observations. That is an interesting result. :)

    Hi Matt,

    Good questions. :)

    Actually, PageRank is patented by Stanford and not Google – Method for node ranking in a linked database

    There’s a copy of a restated licensing agreement for that patent and the subject matter that it covers between Google and Stanford. I don’t know if it is updated, but at some point in time (2011?), the exclusive licensing agreement runs out. Regardless, that doesn’t mean that others can patent inventions like the one listed above, even if they aren’t Google or Stanford.

  12. Hi Rajat,

    Many blogs do express the authenic human voices of their authors, and express sincere endorsements with links – but many times links are references instead of endorsements, and that is one of the fundamental flaws of pagerank itself.

    As Tim Berners-Lee noted in The Implications of Links:

    The intention in the design of the web was that normal links should simply be references, with no implied meaning.

    A normal hypertext link does NOT necessarily imply that

    One document endorses the other; or that
    One document is created by the same person as the other, or that
    One document is to be considered part of another.

    The language surrounding a link may carry an endorsement, but the act of linking by itself allows for a reference to the linked document.

    I do agree with you that many blogs do contain value and credibility. It’s interesting that the authors of the patent filing seem to be trying to look at things like number of RSS subscribers to a blog, or the amount of pagerank to the homepage of a blog to establish how credible those might be.

  13. Hi MGa,

    It was interesting to see this patent filing come from Microsoft. They’ve been coming out with some pretty interesting patent filings, though I’m not sure what reflection that might cast on improvements to what they offer in terms of search results.

    I would like to hear more about the experiments that they described in the patent application on their tests on these methods, and how they determined whether or not search results were improved from this process involving blogs.

  14. Typically Microsoft….
    Time told us that Google indeed loves content based sites. It would be hard to imagine Google disliking blogs. A blog is all about content, it’s regularly updated (or so it should) and so far as I can see it is easier to rank a blog than a website. I also think that blogging is a great tool for people who have something interesting to say but do not know how to build a website. Just basing the rankings on the following, would be a very extreme measure.
    “- Words and phrases from the page, such as “permalink”, or “blogroll”, or “powered by”, or “trackback”, “comment”, “comments”, “blogad”, and “posted at” or similar terms, including non-English ones, that are commonly found on the pages of blogs.
    - if the web page contains an ATOM feed or an RSS feed.”

    Because this only shows the script that is used to build the site/blog and has nothing to do with quality/content/popularity whatsoever.

    I so agree with Dr. John’s comment :“Thirdly, blogging software was invented, remember, for those who couldn’t design and build a web page if their life depended on it. So why on earth does that make the content of a blog better than the content on a web site hand-crafted by someone who can design and build one?”
    I use WordPress on many domains now, simply because you can set up the whole thing in half an hour time and actually spend your time on content rather than layouts and getting your margins right….

  15. Hi Dave,

    Some good points – I do like to questions some of the things that we see in patents from the search engines. One of the things that is worth keeping in mind is that we are often only give a limited view of a method described in a patent, and the system that might be built on top of what is illustrated in a patent filing could end up being significantly more sophisticated.

    I think the benefit of blogging software isn’t that you can use a blog without knowing how to create a site – the real benefit is the ease of use in creating pages, as you point out. That’s true of all content management systems. I don’t use blog software because I don’t know how to code in HTML – I use it because it makes it easier for me to create a new page without having to code a new page, or amend other pages to add links to the new page.

    There are many designers who use blogs, even though their coding and design skills are very strong.

    If we look back at the origins of Blogger, one of the early popular blogging software applications, the software was built as an internal communication tool for designers working together on another project – a programming and design project from Pyra Labs. Many other early blogs were static web pages from designers that were updated on a regular basis. Blogging software was aimed at making it easier to update pages rather than created for people who couldn’t “design and build a web page if their lives depended upon it.”

    One reason why this patent application focuses upon blogs is because blogs do have characteristics that distinquish them from other kinds of sites, and make them easier to identify.

    It might be a mistake to imagine that this patent filing means that a search engine would not look at other signals that might determine whether or not the pages of a site (or blog) are of high quality. For instance, the patent filing notes that they might look at the pageranks of blog pages, which is another signal of impotance or quality.

    At its root, I think this is an interesting idea, and it creates the possibility of asking if other search engines are doing something similar.

    For example, Google carefully limits the sites that they include in Google News. Imagine if they might be doing something similar for links to web pages that appear within their pool of news sites. I’m not saying that they do, but I’m not sure that I would have considered that they might until I read this Microsoft patent application.

    Rather than reading something like this patent filing, and pointing out some potential flaws, there might be some value in exploring some of the implications that it might raise.

  16. I think that the Google Blog Search overall is not a bad idea. It can be very useful for a specific search. The same way that we all use Google News to search for something recent and newsworthy on the topic, we might like to get more opinions on topics and do a blog search.
    However, in generic searches all kinds of sources should be mixed together. Google does that very well. Under most keywords you will get a few news sources, a few YouTube videos, some images, a few blogs, a few commercial sites, etc.

    I just wonder. On what is google basing the blog search results now? The RSS feeds or do they use some patent as mentioned above to determine whether it is a blog or not?

  17. Hi Dave,

    Google may have provided some clues as to some of the ranking factors they may be looking at in Google Blog Search in a patent filed back in 2005. I wrote about it in Positive and Negative Quality Ranking Factors from Google’s Blog Search (Patent Application). They may look at number of subscribed RSS feeds, but they also listed a good number of other things to consider.

    Chances are that some of the ranking factors mentioned in that patent application are only examples and aren’t used, and that there are other ranking factors not mentioned that might be.

    As for the blended or Universal Search Results, Google provided us some information about how those may work in another patent filing which I wrote about in June, in the post How Google Universal Search and Blended Results May Work.

  18. As for websites that are linked to by blogs, I am worried that the value of a blog link is going to be depreciated since more and more people are using this as a tool to get quick on topic backlinks and bloggers allow them to. Blog comments can be very valuable, since it created extra content and also can freshen up some very old content in a blog. And I do think that collective intelligence is a big thing on the internet for now and in the future.
    I’m just worried that the whole series of “Great post, dude!”-comments that are actually approved by moderators are going to ruin the opportunities we have here not only to build great sources of information and opinion, but also for SEO purposes.
    Perhaps the do follow movement should be enhanced by a “useful comments movement” and allow only a do follow link if the comment adds value to the page.

  19. A question worth considering, Dave. :)

    It’s possible that links within a post on a blog page may carry different amounts of weight than links within a comment on the same page. Yahoo, Google, and Microsoft have all shown that they could possibly examine the visual layout of a web page, break it into blocks (or segments or parts of a template), and possibly provide different values for links from different parts of pages.

    If you haven’t had the chance to read Microsoft’s paper on Block-level Link Analysis (pdf) from 2004, it’s worth looking at.

  20. My Own Experience

    I had a PR0 blog that I took to PR3 in less than 2 months using bum marketing techniques. If they don’t love blogs then why am I at a PR3 with my blog while my website remains with a PR2 rating?

    Take care
    SEOGuy

  21. Hi SEOGuy,

    Thanks for sharing your experience.

    If they don’t love blogs then why am I at a PR3 with my blog while my website remains with a PR2 rating?

    Because you got some decent links pointing to your blog over that time period. PageRank is a metric involving links, and not how much Google likes or dislikes your site.

  22. Thanks for the analysis.

    I would agree with you if I wasn’t getting the same links also pointing to my website which hasn’t changed from it’s PR2 level.

    In the same 3 month period no change takes place on my website

    While my blog went from a PR0 to a PR3.

    Even though my marketing plan produces the same backlinks for both sites. To me that clearly demonstrates that Google sees blogs on a different (perhaps) higher level than websites.

    Take care
    SEOGuy

  23. Hi SEOGuy,

    Thanks. We know that the toolbar pagerank that we see isn’t an accurate indication of actual pagerank, but rather a snapshot of the pagerank for your front page at any point in time, and that it has been only updated 3 or 4 times a year in the past few years.

    The structure of your site, and its internal links may also play a role in how pagerank is distributed to pages of your sites. Since that is different from one site to another, it may have an influence in what you see in the toolbar.

    Other people may have linked to pages of your blog, but not to your web site, which could also play a part in the difference.

    It’s difficult to say, based upon your marketing plan alone, that Google is favoring one type of site over the other. There are potentially too many other factors involved. I’ve only named a couple. Regardless, it doesn’t hurt to continue to move forward and find new links, and build new relationships with others.

  24. I think it depends on the type of blog. My blog that has a “blog” on its domain has good PR and on top of search engines.

  25. Hi Jun,

    I agree with you. If a search engine is paying attention to where blogs are linking, I think they might be careful about the blogs that they are watching.

  26. wow, your post really started quite a discusiion. I am in a similar boat with seoguy. My blog outranks my site and is one page 3 for major key word and my site is page on page4. However my website gets more traffic. Hard to figure out for me.

  27. HI Danny,

    Thanks. I’ve seen that on a number of sites, where the associated blog outranks the homepage of the site in terms of PageRank and of where it shows up in search results.

    I’d have to look at your blog and website to get an idea of why the website would get more traffic, but there may be a number of reasons. For example, people may find the titles and meta descriptions/snippets more interesting on your website, and click through more often, or you may be getting a wider range of people visitor for long tail terms that you may not be following to closely, tracking main keywords instead.

  28. Very interesting post. It would really change certain SEO strategies if blogs were given slight preference by search engines. I think that currently the links within the body of a blog post are given more authority than the links inside the comments, which makes sense because most of the comments are probably not placed by the author of the blog and therefore the links in the comments cannot be assumed to be endorsed by the author. On the other hand, the author has authority to moderate those links, so perhaps it does make sense if all the links on the page have equal credibility. In either case, I could see this being abused if it becomes reality, as people will try to “spoof” blog status of their pages by putting “blog” into the URL and adding blog-like supporting files to the directories. I guess we’ll see how this pans out.

  29. Hi Jeff,

    Thanks. You raise some interesting points. I’m leaning towards a number of the same conclusions that you are about the links within blog posts being given more weight than those in comments, based upon the amount of control that the author of the blog has over where those point. I think that the creation of the rel=”nofollow” attribute/value, and its quick endorsement by Google, Yahoo, and Microsoft, and inclusion in Blogger and WordPress showed a concern for how much control most bloggers take over the links found in comments.

    I do like the idea of more people adopting blogs, and trying to open up a conversation with visitors to their site, even though it’s possible that some might do that to “spoof” blog status, and you suggest might happen.

  30. I have been noticing more and more the value of blogs in the serps, It appears blogs are like the new love for most search engines, in a way like how the search engines loved directories in the past..

  31. Hi Timon,

    Thoughtful point. I’ve been seeing blogs in search results for years, but I’ve also been paying a lot of attention. I’m not sure if the search engines are loving them anymore now than in the past.

    Instead, maybe there are more high ranking blogs than there were previously?

  32. Hello! Nice debate,in my opinion search engines loves blogs mainly because of their fresh content, probably if you make a blog and stop update it for several weeks or so that blog will die.I had a blog in other language that have a decent number of backlinks and it’s traffic from search engine was about 100-120 visits per day,now after 2 months without writing nothing..gets around ~20 visits a day.
    So in my opinion fresh content is the main cause that blogs are ranked better than static websites.

  33. Hi niceblogger,

    Good point. I do think that freshness is one of the things that the authors of this patent filing are concerned about. They did nail down the reasons why they might consider this approach is a short phrase. Blogs (that have accumulated some level of PageRank tend to be: “frequently updated, more informational rather than personal, and free of spam.”

  34. Pingback: Raven SEO Weekly Digest - Issue 45 « Internet Marketing Blog
  35. Nice post man!
    I always thought blogs receive less preference from Search Engines until one of my blog pages on a blogspot blog surpassed the official site for a particular search term. I think well SEOed blog pages really do well in SERPs.

  36. Hi Robert,

    Thanks. Blogs do offer the opportunity to publish current thoughts and ideas and topics, and engage with others in a meaningful way. That can mean more traffic and links going to them than sites that are more static, and don’t change much.

Comments are closed.