How Google Might Filter Blog Posts from Google Blog Search

Google was granted a patent yesterday on Blog Search, and how the search engine might filter blog posts out of blog search based upon a number of factors. The patent was originally filed in 2006, and it’s the first patent filing I’ve seen from Google that uses the term “splog.” The screenshot from the patent below shows some of that potential filtering process

A flowchart from a Google patent showing some of the aspects of a blog post that might cause it to be filtered out of blog search results

I’ve written a couple of posts in the past about how Google might be ranking blog posts based upon other patent filings from Google, including Positive and Negative Quality Ranking Factors from Google’s Blog Search (Patent Application) in 2007, and How Google May Rank Blogs in 2010.

The patent application from the first post, Ranking Blog Documents, is still pending as of now, and the patent described in my second post, Indexing and retrieval of blogs, was granted at the time that I wrote about it.

Here’s the newly granted patent:

Providing blog posts relevant to search results
Invented by Kushal Dave, Joshua D. Mittleman, Kevin Scott, Vladislav Shchogolev, and David Alpert
Assigned to Google Inc.
US Patent 8,117,195
Granted February 14, 2012
Filed: March 22, 2006

Abstract

A device identifies a search result document based on a search query, and searches a blog post repository to identify a blog post relevant to the search result document.

The device also rejects the blog post if the blog post has insufficient length, contains outgoing links located a predetermined distance from the beginning of the blog post, has a large out-degree, was created before or after a predetermined time, or has incoming links with a low link-based score.

The device further provides the blog post in connection with the search result document if the blog post was not rejected.

The patent primarily describes how Google might filter out certain blog posts from being included within its database of posts that might be returned to searchers in a blog search, and also describes how additional information from some posts might be presented by a search engine.

It’s pretty upbeat about blogs themselves, but recognizes that search engines don’t always make it easy for searchers to find blog posts:

Blogs may often provide useful information about a search result, such as honest reviews, contrasting opinions, links to related material, etc.

Unfortunately, search engines do not display blog posts that are relevant to a specific search result, making it difficult to find blog posts containing information useful to a search query.

Undesirable Blog Content

When someone performs a web search at Google, one of the options they have on the search results page is to click on link on the sidebar for blog search results to appear. Those results are displayed from a data repository that contains information about blogs. But it doesn’t capture every blog post.

Many posts that are considered “undesirable” might be filtered out. The patent provides the following examples of the kind of content that might cause a blog post not to be included in the blog repository:

  • Profanity
  • Pornography
  • Racism
  • Spam
  • Stolen content
  • Chain letters
  • Viruses
  • Spyware
  • Fraudulent solicitations
  • Unwanted pop-up advertisements
  • Etc.

The patent isn’t just about possibly filtering out posts that might contain certain types of content, though. It also points to certain possible rules that might be used to filter out another type of undesirable posts:

One example of an undesirable blog is a spam blog, sometimes referred to by the neologism “splog.” Splogs may include blogs which the author uses only for promoting affiliated documents (e.g., documents linked to by the splog).

The purpose of a splog may be to increase the link-based score of affiliated documents, get advertising impressions from visitors, and/or use the blog as a link outlet to get new documents indexed.

The content on a splog may often be nonsense or text stolen from other documents with an unusually high number of links to documents associated with the splog creator which are often disreputable or otherwise useless documents.

In addition, additional blog posts might be filtered out of the repository based upon a review of the outgoing links of remaining blog posts, such as links to content filled with profanity or pornography.

Rules for Filtering Undesirable Content

Here are some of the rules that might be used to remove blog posts from blog search:

Number of outgoing links – If a post has more than a certain number of outgoing links, which might be a predetermined number, such as fifty, then it may be removed. Those outgoing links could possibly include advertisements.

The number of links for that initial threshold might not be a predetermined amount, but might instead be determined by a statistical model, based upon a machine learning approach, to find a number of outgoing links that might provide a good tradeoff between accepted blog posts and rejected blog posts.

If a post doesn’t go past the threshold of outgoing links, it might next be checked to see if it has any incoming links.

Lack of incoming links – If there are no incoming links for a post, it might also be rejected. We’re told:

For example, a blog post may have zero incoming links because the blog post does not contain any useful information and nobody is interested in it. Such a useless blog post may be removed from the repository.

Link score threshold – If there is at least one incoming link to the post, a link-based score for the link might be calculated for any links pointing to the post. If that link score pointing to the post doesn’t attain at least some minimal level, a post might not be included in the blog repository.

This link based score might be increased by incoming links to the post, and decreased by outgoing links to other documents.

Lack of Title – If the link-based score is high enough, the next step might be to determine if the post has a title. If it doesn’t have a title, it might be rejected:

For example, a blog post without a title may indicate that the blog post is not trustworthy and/or contains undesirable content. If the blog post has a title, then the blog post may remain in the repository and not be rejected.

Links to self or same domain – Blog posts with links to the same domain, whether to the post itself or other pages on the same domain, might also be removed from the repository, though the patent tells us that those links within the same domain might be ignored instead.

Links to electronic media – Posts with links to electronic media, such as images, movies, or audio, might also possible be rejected. Not stated in the patent, but it’s possible that rejection might be based upon the type of media being linked to, like the kinds of undesirable content listed above.

Sufficient Length – If a post isn’t of a sufficient length, it might also be removed. While that length might be required to be a certain amount of words, for instance, it might also be an amount determined by a machine learning process.

Distance of links from start of post – If the outgoing links in a post don’t appear within a certain predetermined distance from the start of a post, it might also be rejected. This appears to be intended to avoid posts that might contain too many links.

Recency of posts – Posts that are older than a certain predetermined amount of time, such as 2 weeks, might not be included in search results. Those recent posts might also need to have a certain link based score to be presented as well.

Categories of additional filtering rules

While the filters described above might be used, the patent tells us that it might only use some of them, or it could consider other heuristics, or rules, as well. Those could fall into categories that might consider: topicality, quality, freshness, and/or significance.

Topicality would involve whether a blog post is really discussing a query that it might be a search result for.

Quality could include whether a blog post is “well written, information rich, and/or generally useful.”

Freshness could be based upon a determination of whether a post is “recent and/or provides timely information.”

Significance involves whether the information provided by a post is important.

Some other heuristics might include:

  • How many people subscribe to a blog post
  • Whether a post has a particular political slant, such as conservative, liberal, and/or moderate
  • If a post “expresses an opinion about a search result,” so that not all positive or negative or indifferent posts are shown to a searcher

The patent also provides some alternatives regarding how blog posts might be displayed in search results.

Take aways

I find myself wondering if any of the listed inventors on this newly granted patent had been bloggers before the time that they wrote it. I find myself breaking a few of the rules above.

For example, it seems reasonable to link, like I did at the start of this post, to previous posts on this domain about how Google might be ranking posts in blog search.

I did write one post last summer where I provided a little over 1,000 links to patents at the USPTO that Google had been assigned by the USPTO. It made sense to do so at the time.

In my last post, I embedded a video from YouTube (a Google property) of a presentation from Google’s Director of Research, Peter Norvig.

The idea that Google might explore a number of different heuristics to determine when to filter posts out of blog search makes sense though, and basing them upon categories such as topicality, quality, freshness, and significance feels right.

The last couple of heuristics I wrote about, involving whether posts might involve political slants or opinions seems more like a decision to include diverse results based upon some kind of sentiment analysis.

I did perform a number of searches at Google’s blog search while writing this post, and I can’t say that I’m really satisfied with the results I was receiving.

Share

93 thoughts on “How Google Might Filter Blog Posts from Google Blog Search”

  1. Hi Bill,

    Really concerned here, when i look at the image from the flowchart and see the “Reject Blog Post” it seems that majority of the Blogs in the world wide web should be rejected.

    Also linking to other posts within the website or as i call it content siloing is something that a lot of SEOs have been practicing for such a long time, the basic concept being that those links provide entry points for the search engine spiders to crawl more pages of the website.

    I am also not too convinced about other rules that they have defined in the patent.

    Over all the whole patent seems to be heading in the wrong direction and as you rightfully stated the authors of the patent do not have a blog of their own and that is why are being so harsh.

    – Sajeet

  2. This is quite a worrying sign for new bloggers. Those that have built up credibility already should be OK, but new guys coming in won’t have many inbound links to their pages, and it’s going to be very difficult to get seen in the first place if your blog is filtered out by these arbitrary rules.

    It also seems to go against the recent freshness update, no? New blog pages will have many fewer links than older posts…

    Perhaps I’m just over reacting.

  3. I mourn the day WordPress switched to using Google Blogsearch rather than the then useful but not perfect Technorati
    Technorati eventually stopped innovating in that direction (as it wasn’t making them money) and Google’s results have only got worse over time.

    You would think indexing blogs would be looked on as an important part of real time, but with so many of the mainstream blogs in Google news, blogsearch became far less interesting for them.

  4. For example, a blog post may have zero incoming links because the blog post does not contain any useful information and nobody is interested in it. Such a useless blog post may be removed from the repository.

    What about new posts? Aren’t they going to pick the new posts with Zero links? And if incase, lets say, they will index the post only when it gets a backlink, then in that case, dont you think that the Recency will make a problem? For example, I write a post today, and it has no backlink for 20 days so it wont be indexed, and now that it gets a back link after 21 days, its too old to be indexed. Isn’t it confusing?

    Secondly, talking about the link in the post, does they mean ‘the post only’ or ‘the page containing the post’, because often, the blog posts has tons of comments, with lots of outgoing links from the post page, so which one is this patent referring to?

    Thirdly, about the outgoing links, does the ‘no-follow’ considered as an outgoing link by Google? Suppose my blog post has 12 links, with 10=nofollow & 2=Do-follow, how will Google look at the link count? What’s your opinion about it?

    Fourthly, If a blog post is having a linkback as a parameter to get indexed, don’t you think Google is deliberately trying to push people more towards spamming, and unnatural link building?

    Finally, There is a small spelling mistake in the Recency of posts
    “Those recent posts might also need to ahve(HAVE) a certain link based score to be presented as well.”

    Thank you.

  5. I do the same thing on my blog with linking to older posts and I see most bloggers doing the same thing. Isn’t that good SEO practice? If that is something that discounts a blog then it’s not wonder we get horrible results when we search using the blog search. I rarely find good results when I use that search and it’s a shame considering how many blogs are out there.
    I also see a lot of great bloggers who have HUGE follow ships posting YouTube videos on their posts. I personally like seeing the videos as long as they are relevant to the content.
    I take this news as them saying bloggers shouldn’t be doing SEO in their blogs because doing so hurts them.

  6. Some really interesting stuff here. I think for experienced bloggers or any decent SEO is most likely hitting this check list. As for links coming in…? I wouldn’t panic. I think lots of content can rank and be displayed in Google Blog search without links. I’m all for anything to get rid of or at least minimize “Splogs”.

  7. Hi Bill,

    My take on this comment may be a little different from yours:

    “Distance of links from start of post – If the outgoing links in a post don’t appear within a certain predetermined distance from the start of a post, it might also be rejected.”

    Rather than meaning: “This appears to be intended to avoid posts that might contain too many links.”

    I propose: “This appears to be intended to avoid posts (and articles from article directories) that link out only in the form of a resource box or byline forced sown to the bottom of an article or post by the receivers TOS, being that such content is almost always written as a means to manipulate search results.”

    This would absolutely decimate any and all platforms that accept articles or guest posts and “link out” as a means to reward their “contributors”.

    Oh this is really big for SEO…I think.

    GOOD FIND!

    Mark

  8. This is going to be huge. As I understand Google is going to filter all the trash that is out there and it’s really good news for bloggers that have been working hard and developed their blogs. I also hope that they are going to get rid one day of all that satellite SEO websites that are just bringing fake extra points. All new requirements will put our blogging culture on the new level and will raise the competition!

  9. I think the patent is being broad in its inclusion of criteria to document the method more fully of using a small set of blog-related metrics to define a minimum threshold of acceptability. I’m not sure it would be wise to assume that Blogsearch (which was revamped a year or two after they filed for this patent) operates this way.

  10. I hate to see that legitimate SEO can actually hurt a blog or site. It seems like a lot of this has been left fairly ambiguous as far as what links or media is undesirable, but Youtube links are a way of life and will stay that way.

  11. Bill, thanks for the info. There are many takeaways from this when blogging for SEO or just personal blogging to build a better reputation. It’s crucial to promote each blog post by linking in from a trusted third party. And quality cannot be emphasized enough. I’m sure the days of cheap, bulk blogging for manipulative SEO are over–rightfully so.

    Best,
    Henry

  12. I still have doubts on Internal links to same domain and about electronic media. we have to add presentation or video to elaborate more on the topic and I think we could get lots if things embedded in Google blogs as well. IF it is my company blog i would surely add link to my own service page.

  13. Despite of the fact that these changes would indeed make our search experience better, many bloggers are gonna object to these changes !

  14. An extremely interesting concept, if google did implement that in their ranking algorithm im sure the rankings would change dramaticly.

    Cheers

  15. Just read the post and the Google Monster is continuing to shape the internet and www landscape.

    With an emphasis on video today, the section on Links to electronic media really has me thinking about just what type of impact videos and electronic media will have int eh future.

  16. To the point: “Distance of links from start of post.”

    I usually add only one outbound link in the last paragraph or sentence of my blog posts. Seldom I add additional outbound links within my blog posts, but in those cases they are never more than 3 outbound links in total. If that appears to be intended to avoid posts which may contain too many links it is understandable. But what about my case?

    To the point: “Links to self or same domain.”

    I understand the point about links pointing to the a page itself, which is a common WordPress issue. That is a usability issue, and to be specific a web content accessibility failure. Such an issue can only confuse blind users. About linking to other pages on the same domain, looks like they are trying to disable site owners to herd the PageRank flow. Or do we disagree?

  17. Most of these filters are a lot of sense! A “honest” blogger will have no difficulty meeting them. Yet I feel that many “splogs” still appear in search results of Google Blog Search.

  18. I agree with Mark’s point that the “distance of links from start of post” may be an attempt to curb the impact of spammy guest blog posts and articles with link attribution in the boilerplate/author box.

    Also, with the exception of the heavy hitters, the vast majority of blogs only get a limited amount of links to each blogpost. This patent seems to favor popular blogs and cast smaller blogs to the side, regardless of the quality of their posts. It’s not so surprising that Google would do this, it’s just a shame that less popular blogs won’t get their voices heard.

  19. Well’ the changes might be expected… I think they’re going to be really useful for good bloggers. As for splogs, this will be an additional stimulus for them to improve their content.

  20. Google and its engineers are getting very smart with all the new algorithms and patents to better provide useful content to people when they search. I would have to agree though. A lot of blogs are spam and they are not properly optimized to say the least. I think this is a good idea. It will help the honest bloggers out there.

  21. Hi Sajeet,

    While the patent identifies a lot of things that Google might potentially use to remove or reject some posts from appearing in Blog search results, chances are that Google would experiment with those, explore other topics, find ways to refine their approaches, and so on. The items I wrote about are in the patent description, but chances are that Google might use some of those, none of those, or others as well.

    I agree that the screenshot from the patent makes it look like Google would reject many informative, legitimate, and interesting blog posts from blog search in a way that might seem draconian.

    I think what’s more important than looking at the specific things described within the patent itself is recognizing that Google might filter out some results based upon rules that might be similar in some ways to what the patent describes.

  22. Hi Neil

    Chances are that Google probably won’t pay too much attention to a really new blog that doesn’t have many posts or links, but focus instead on writing things that an intended audience might find interesting and engaging, have a social sharing strategy that avoids being too self promotional, but might lead people to want to learn more about the blog and its author. Build relationships with other bloggers, and links will come.

  23. Hi Andy

    It does seem that Google hasn’t been nurturing their blog search and giving it the attention that it needs, which is unfortunate because there really aren’t many options available these days. Technorati had some issues when WordPress made the change to Google results, and it hasn’t gotten better.

    I know Microsoft has a few patents on blog search, and I’ve been expecting them to bring one out, but I’ve been waiting for a while to see it. Not sure they realize what an opportunity they would have with it if they could release something that does blog search even just a little better than Google does now.

  24. Hi Asad,

    Thanks for the questions. Since this patent was filed in 2006, it likely didn’t take into account things like the rel=”nofollow”, or social sharing on sites like twitter or Facebook and how services like that might use a nofollow.

    I’m not sure that I like the “rule” about links being within a certain distance from the start of a post. It doesn’t always make sense to link just at the top of a post, and often a link is best placed where it makes the most sense to do so within the context of a post. I suspect that really didn’t consider links within comments as being part of that rule, and probably just meant links within a post itself.

    Really not sure how “nofollow” would fall into what the authors of this patent were thinking about, but if Google is using this patent or something based upon it, that is something that they might have thought about. My thought is that a blog post that’s linking out to a lot of other sites, but is using “nofollow” values in most of those links may just be using “nofollow” too much. Why link to so many pages that you don’t trust?

    Google as a whole might be a little too focused upon links, though they do seem to be moving towards a number of other signals, including user behavior signals, social signals, and with the Panda update, other kinds of quality signals as well. I’m not sure that links are quite the issue now as they were when the patent was written.

    Thanks for pointing out my typo.

  25. Hi Ashley,

    I’m still trying to understand why the writers of this patent would take exception to linking to older posts. I’m not going to rewrite something that I could just link to if it’s relevant and appropriate to link to, like my two links in the second paragraph of this post. Also, many people use plugins that provide links to “related” posts as well. I suspect that particular “example” from the description of this patent is one that likely was either abandoned or reconsidered by those who work on Blog search at Google.

    I agree with you about the quality of blog search results. I usually get much better results from blogs when Google treats them as web pages in regular web search results. The question is though, why have a dedicated blog search when you show blog posts in Web search? My answer is, “So that you don’t have to look at results that aren’t blog posts.”

    I will sometimes embed a YouTube video in a post, too. I can’t see a problem with doing that either. The patent seemed to have an issue with links to pictures or videos that might contain the kind of “undesirable” content that it listed (which is in a list in my post above), maybe more than the inclusion of videos or pictures or podcasts.

    I don’t think the patent is really about doing SEO or not doing SEO on a blog post as much as it seems to be that Google is trying to come up with ways to keep from displaying splog results. There’s nothing in it that says don’t make good choices of language (keywords) that your intended audience will likely search for, or don’t make sure that search engines can crawl all of the pages on your site that you want indexed, or similar language.

    It does look like they looked at a lot of splogs to see the things that they seemed to be doing, and didn’t look at non-spammy blogs to see whether or not they were doing some of the same things.

  26. Hi Matthew,

    I’m not a fan of splogs, and I really hate it when I revisit a blog I hadn’t been to in a while, and it seems like its been taken over by someone who likes to link to weight-loss pages, and casinos, and pharmaceutical pages.

    Rather than using the “features” described in the patent that I wrote about in this post as a checklist, I think it might be better if people who would like to show up in blog search do some searches on a regular basis to see if their posts are showing up there, and try to get a sense of why them might not if they don’t.

  27. Hi Mark,

    Regarding the outgoing links being a certain distance from the start of the blog, here’s some of the language from the post itself:

    If the outgoing link(s) does not appear within the predetermined distance from the beginning of the blog post (block 930–NO), then the blog post may be identified as one not to be shown with the search results (block 920). For example, the predetermined distance may be set to any number depending upon how close it may be desired that the outgoing link(s) be from the start of the blog post. If the outgoing link(s) appears within the predetermined distance from the start of the blog post (block 930–YES), then it may be determined if the out-degree of the blog post is small (block 940). For example, a threshold for the out-degree may be set low to prevent display of blog posts that contain many outgoing links.

    I do suspect that Google is loathe to include within blog search results (and possibly even web search results) posts (or pages) that they think might contain too many outward links. It might be possible that might be tempered by something else, such as a good number of incoming links to a page, but I’m not convinced on that.

    The language from the patent itself, in that section, doesn’t seem to be targeting post or pages that might include links at the end as might be seen in guest posts or article pages.

    But yes, just the general idea that Google might be filtering what shows up in Blog search based upon rules that might be similar to these is definitely something to think about, and maybe experiment with.

  28. Hi Alex,

    The patent was just granted, but that doesn’t mean that what it describes is now something that Google is going to work upon. Chances are that they probably started experimenting with filtering results like this sometime within the year before it was filed, and chances are that they’ve tried a number of different rules to determine which posts to show, and which ones to not show.

    I’d love to see better results within Google’s Blog search, and hopefully they will find a way to filter out splogs while including posts that shouldn’t be filtered.

  29. Hi Michael,

    There are definitely some filters in place, and it’s likely that Google has experimented with a number of different heuristics over time to determine what they might filter, but I agree completely that the examples in the patent are probably much too broad to be what they are using.

    There are a few things in the patent that involve the look and feel of blog search results, such as Google sometimes possibly showing some “reference links” under a blog post search result, where they’ve taken links from the post itself. I just noticed a couple of those under one result in blog search yesterday, but I can’t say that they haven’t been doing that for a while. I hadn’t seen any references to those reference links anywhere else before.

    But it’s easy to see an interface feature like that, and much harder to see the internal gears behind the machine works. Chances are good that Google is using some filtering, but it’s not necessarily going to be easy to see how those might be implemented.

  30. That’s interesting. Makes sense though. It sucks having to teeter around search results at work for fear of something crazy coming up. As a web developer, I’ve experienced crazy results with searching the ‘head’ tag and the form ‘action’. Kind of ridiculous! lol

  31. Wait, wait, wait…you’re saying I shouldn’t write racist blog posts?

    J/K :)

    Common sense in SEO goes a long way. Write useful content and people will find it and share it.

  32. Hi Matt,

    None of the things listed in the patent as factors that Google might look at when deciding whether or not to filter a blog from search results could really be called SEO, so I don’t think you should be concerned about “legitimate SEO” harming a blog. Chances are as well, that at the time this patent was written, it really didn’t anticipate people embedding YouTube videos in blog posts. The patent seems more concerned with blogs that might link to pornography than anything else.

  33. Hi Henry,

    The patent is aimed at trying to identify splogs that Google doesn’t want listed in blog search so that Google might filter those, rather than giving us a set of guidelines about how to get blogs listed, such as getting a link from a “trusted” third party site. Instead of worrying about having to procure a link to a post, if you focus instead upon writing posts that people would be interested in linking to, you would probably be better off in the long run.

  34. Hi Alex,

    I agree. The features that they point to within the patent description seem somewhat shortsighted. Then again, would you really link to your service page in every post that you write? I’d probably include that instead as a sidebar or main navigation link (which is what I do).

  35. Hi Nadeem,

    These aren’t “changes” that Google is officially announcing anywhere telling people what they should and shouldn’t do with their blogs. Instead it’s a patent that Google filed for back in 2006, which was probably written back in 2005, which tells us that Google will filter some blogs from showing up within Blog search results based upon a number of rules about the posts. The patent description does include a number of rules that it could potentially use, most of which I included in my post, but chances are that even if they considered using those specific rules, they’ve probably experimented with those and amended them since.

    Chances are good that Google is using a filtering system, but the factors that they are considering are likely more sophisticated now. For example, in the blog post after this one, I included links to 97 patents at the United States Patent and Trademark Office, and a link to a news article. Those links were spread throughout the post, including many towards the end of the post. I checked a bit after that post was published, and it was included within Google’s blog search. So it looks like the number of links that post might have isn’t causing it to be filtered out of Blog Search, and the fact that I had a good number of links throughout the post, instead of just near the start of the post wasn’t either.

    Even though there were a lot of links, and many links near the end of the post, they were pointed to pages on a couple of legitimate and authoritative sites. From that post alone, it seems that Google’s approach to filtering posts is more sophisticated than what the patent’s description tells us.

  36. Hi Danny,

    Chances are really good that Google has added a number of filters to not only blog posts, but also to most of the search results that we see in all of their searches. Perhaps the best known of those is Google’s Safe Search, which filters out adult content from Web search results and from image results.

    I think the important lesson that this patent has for us is that they also apply filters to Blog Search as well, and that the kinds of filters that Google applies to their searches can be based upon more than just certain types of content.

  37. Hi Barry,

    Google announced that they were acquiring YouTube on October 9, 2006. This patent was filed on March 22, 2006. Chances are that they thought a little more about whether blog posts with videos might potentially be harmful since it was filed. The patent seems more concerned with people linking to pornography than with people linking to videos.

  38. Hi John,

    My post after this one, on February 17th, 97 Hewlett-Packard Patents Assigned to Google contains 98 links, many of which are spread out throughout the post from start to finish. I also included 36 named anchor links on the page, pointing to other parts of the page, so I covered the “linking to the same post” filter described in the patent as well.

    The post wasn’t filtered out of Google Blog Search.

    I’m not sure that you need to worry too much about those aspects of the filtering system described in this patent.

  39. Hi Astyanax,

    I don’t think that many of the filters described in the patent as possible filters that Google might use to try to stop manipulative blog posts, or splogs, really have much to do with honesty or morality. Instead, they are mechanical rules that a computer might be able to follow to filter out posts that were used as descriptions withi the patent. If you read my response to John in the comment just above this response, you’ll see that I tested three of those rules with my post after this one, and that post wasn’t filtered out of search results. It wasn’t a splog post, and the links I used were perfectly fine and legitimate links. No dishonesty involved. :)

  40. Hi Jose,

    I don’t think you need to worry about the filters described in the patent I wrote about favoring larger and more popular blogs over smaller and less well known blogs. If bloggers focus upon blogging about things that interest their audiences and that those audiences might be interested in linking to, or tweeting about, or sharing with others on Facebook or Google Plus or elsewhere, chances are that shouldn’t be a concern.

    We have no idea of that particular approach to filtering posts from Blog Search is in place, but the world and the Web has changed a lot since that patent was filed. Chances are that there are filters that keep some blog posts from being shown in Google Blog Search. But they aren’t necessarily the ones described in the patent.

  41. Hi Sandra,

    I don’t think that the aim of the patent is to get people running splogs to improve the content that they write, but rather to try to eliminate splogs completely, or at least to not show them in Blog Search results if possible. I’m not sure that many of the filters described in the patent are ones that will necessarily improve the quality of blog posts either.

    They are examples of things that Google might look for when doing filtering, but I’m not sure that they are the best examples. Then again, it may be that Google tried to avoid giving better examples on purpose. If they gave us an exact roadmap of how to avoid the kinds of filters that they might actually use, then people who run splogs might use those to try to get their splogs to avoid those filters.

  42. Hi Mike R

    There are a lot of spam blogs being created every day, and theres’s definitely a need to filter those, to ignore the links within them, and to try to keep them from showing in search results. There are too many of them to manually eliminate from search results, which is why Google would try to come up with some kind of filtering approach.

    I do get a little worried though, when I see something like this patent whether or not the approach involved might keep legitimate blogs from being indexed and displayed in search results.

  43. Hi Thomas,

    I get surprised sometimes with things that I see in Web search results at times. For instance, pages that are dead and have no content at all, and when you look at the cached copy you still see an empty page with no content. How does it continue to rank for a specific term when there’s no content there? Of course that ranking might have a little to do with links, but still?

  44. Hi Matt,

    We do still see some racist content showing up in Web Search from time to time. I just tried a search and see some in Google’s Blog Search as well. I suspect that’s an area where Google is facing a conflict between filtering that kind of content out of search results and trying to observe first amendment rights of publishers to publish. I don’t want to see racist results either, but I also am a little worried about Google censoring stuff as well.

    I agree with you though. Write interesting and useful content and people will link to it and share it.

  45. Pingback: » Search Engine Marketing Wrap-up Jan 19
  46. Hey Bill! Good stuff!

    I agree with the four things you added into the post about being topicality, having quality, the freshness of a post and its significance. Google is certainly placing more emphasis on relevancy and usability of a blog. Social proof, will also be a factor, how quickly and heavily a particular blog post has been shared on social media or linked to by other blogs / sites. I can certainly see that increasing even further as a ranking factor.

    However, there are many case where I still see archaic sites out ranking fresh new (killer content sites) yes they have less page rank but the usability, content and relevancy is higher than the old archaic sites, yet the still have trouble ranking for their targeted keywords.

    For me, the constancy is just not there yet. It should work like this.

    Someone builds a site, blogs killer content for 12 months straight. Gains some decent links. Has a better optimised / better structured and more usable relevant site = should out rank the other blogger who doesn’t update his content, has an aged domain, has crap content and a higher quality of links.

    Great stuff keep it coming.

  47. Hopefully when implemented it means that when you search for something you might actually find it without having to go to page 3 or 4 and sometimes further to find what you are actually looking for. As well as that getting a bit tired of the first couple of Google pages taking up with outdated information or non relevant information.

  48. Well, if they want to eliminate the existing spam blogs it’s ok, probably for the start it will be a combined method of filtering, aplying this like it’s on the patent would not be benefic for domains like google.it, google.hu, and so on where it’s really difficult to achieve links and social mentions for a niche based blog.

  49. Blog authority, unique content, freshness of an article and social shares are signals for Google.

    And now, Google+ is a good way to communicate about new blog posts on social media !

  50. Hey Bill,
    very interesting article, thanks for your constant great work. Just to mention it: Comments always do have an image- and/or reputation-dimension too.
    Best regards from Germany
    Udo

  51. When this comes to play, it would be quite interesting to see how this will affect Google rankings.

  52. This is definitely a bad sign for New and developing Bloggers. Those who have already established themselves would have no effects with the new changes as their site would have incoming links and new updated quality content which is shared and re-tweeted hundreds of times.
    The new bloggers would definitely find it hard to grow with these changes being applied to online blogs.

  53. Hi Bill,

    Really interesting article, and something which is becoming more and more required in search results I feel. These days it feels like you really have to dig around if you want a ‘good’ result for a long-tail search for example. Typically the majority of sites displayed in blog search are just scraped content, totally useless. Google need to do something about it, ensuring the ‘blog search’ is a viable search method, at the moment it really isn’t.

    I imagine for already well established bloggers, a change such as discussed in this post won’t have a huge effect, other than perhaps bringing in more visitors. However, as Neil says (2nd comment), for those who are just starting in blogging, it might become a real pain to even get noticed in the first place – but maybe the positives will outweigh this (relatively) small negative. There are plenty of other ways outside of Google search to get your blog noticed.

  54. I’m new to blogging and was only starting to get my head around it, but I suppose that’s the way of the web, it’s malleable and nothing stays the same. Thanks for the information, I hope Google still gives us newbies a chance to build a blog although I can see that this is going to shake things up!

  55. I totally agree this flow chart doesn’t seem to make any sense, how can a blog be found on search unless it already has a very regular user base? Hopefully if Google do decide to try out this patent, they will be wise enough to realize how bad the move was in the first place… wishful thinking maybe?

  56. Hopefully with google filtering blogs within their search it could largely sut down on all of the dead ends of the web, making it easier to find relevant content when you need to.This is a smart move from Google and it shows that Google are still focusing on better search just as they did when they first started

  57. It is all very well discussing the social networking values of blogs. If social shares are important to Google for search results then I am a gonna. The profile of my website(s) and blog users tend to be in the much older age groups that are the lesser informed segment on the importance of using the on-page features to add their voice for social interraction. Therefore they tend to not to engage with websites. They are less likely to add their comment even though we make it so easy and obvious for them to do so. The older age groups on average have more time to devote to engage yet are dis-inclined to do so. My beef is that most of today’s movers and shakers of seo are young things. How could they possibly know the older mind set? If you have a similar problem, how are you addressing it?

  58. Hi Bill
    Confusion or what. . . I am fairly new to all this but statements like the one below from your post are worrying. I consider that linking to other post in your blog as good practice, but it seems this might get your content removed. Also the fact new blog posts might not get a link immediately might see it removed. Very interesting post but does make you wonder where it will all lead to?

    ‘Links to self or same domain – Blog posts with links to the same domain, whether to the post itself or other pages on the same domain, might also be removed from the repository, though the patent tells us that those links within the same domain might be ignored instead’

    Thanks
    Jon

  59. Interesting exercise, but I think even with this patent, Google is trying as it always does to find the best content and kickback the not so good content. I’m not sure if they are doing a good job. One of the key reasons I even use the blog search is to get the freshest content. Isn’t this one of “blog search’s” intentions? If so, I think they should give more relevance to age and less to inbound links. Of course, I think Google should measure the onpage factors to reduce the sblogage. Good, fresh content is what I want. :)

  60. With the fact that internal links can actually get a post rejected, there is a definitely going to be an impact when applying siloing techniques that are based upon internal linking. Still, the main target page, getting all the internal links of the silo, shouldn’t be affected, is that correct?
    Thanks for the great read.

  61. Hi Illiya,

    Thank you. Good points. One of the things that I’ve sometimes seen is that blog posts aren’t always about topics that change or improve with freshness. Sometimes for some topics, an older post is a better post, and I think Google tries to recognize when that might be the case. I’m not sure that they always succeed, but they do seem to try.

    The Google Inside Search blog had a “40 changes to Google’s algorithm” post a few days ago, and one of the areas that those changes seemed to focus upon most were changes to the ways that Google involves freshness into what results it presents. Maybe those changes will be ones that will help in the areas you’ve described.

  62. Hi Steve,

    Since this patent was originally filed in 2006, it’s possible that Google may have tried out a number of the things described in it since 2005. I suspect that Google has tried out many other rules as well since then.

  63. Hi Claudiu,

    I’ve been reading recently that Google seems to have started going after, and deindexing or devaluting, a number of private blog networks that seemed to exist pretty much solely to help boost the rankings of members’ pages by linking to them. That’s not quite the kind of filtering that the patent talks about, but the goal is similar.

    I’m not sure that many of the methods described in this particular patent would really harm blogs in locations like you mention. Most of the rules that Google might consider, or more sophisticated rules that might have evolved from those rules wouldn’t particularly harm too many blogs that might cover niche topics where links and social proof might be harder to attract. And I would suspect that Google might take that particular problem into account anyway.

  64. Hi Mary,

    Google is definitely looking at a wider range of signals, including social signals from places like Google +, and chances are that those could influence whether or not blog posts might be included in Google’s Blog search.

  65. Hi Udo,

    Thanks. I’m not sure exactly what you mean by an “image and/or reputation dimension”. The patent is pretty much silent on most things related to comments for a blog post. It’s possible that Google might be paying more attention to that aspect of blogs these days, though.

    Google did introduce authorship markup for content creators to associate with post or pages or articles that they create. I suspect that we could see something similar develop for comments on blog posts as well.

  66. Hi Shipu and Srikanth,

    Chances are that Google has been applying one type of filtering or another to blog posts since it started a blog search. Chances are also good that the rules described in this particular patent may be ones that they’ve already tried, and in some cases have moved past already.

  67. Hi Andy and Rob,

    I suspect that for most bloggers who aren’t engaged in something like a private blog network, or using their blogs to try to manipulate search rankings, or who might be blogging about some of the topics that Google mentions in the patent as being undesirable, there probably won’t be much of a problem. Even for fairly new blogs once they’ve posted more than a few times so that it looks like they might be around for a little while.

  68. Hi Luke,

    It is a patent that was filed a number of years ago, and chances are that while Google is probably applying some kind of rules that might determine whether or not a blog might be included in Google’s Blog search, those rules have probably evolved and changed over time.

  69. Hi Jeffrey,

    You would hope that Google could do a good job of filtering out blogs that aren’t splogs, and not filter out blogs that provide quality content. I still see things that I might consider splogs in some Google Blog search results, but I suspect not as many as could be in there.

  70. Hi Gerald,

    I’m not necessarily a spring chicken myself, yet this post has over 70 comments on it, and more than 130 tweets. I try to encourage conversations, respond to comments, and engage in social media myself. If you pigeonhole yourself into generational thinking, you may be bound by it.

  71. Hi Jon,

    I’m not sure that it’s ever been a bad practice, when it comes to blogs, to link to and reference relevant content on your own domain when it adds value to what you’re writing. It might be a problem when you do something like link to your home page every time you mention a word or phrase that you’re optimizing that page for, like me possibly doing that every time the word “SEO” shows up in one of my posts.

    I think one of the important takeaways from this patent isn’t so much the example rules that they provided in the patent, which not only seem somewhat outdated but in some cases muddled and wrong headed, but rather that there may be(and likely are) rules that Google uses to filter some posts. Chances are that they are more sophisticated than the ones from the patent.

    I purposefully broke at least three of the rules from this patent in my post after this one (see my comment from above at: http://www.seobythesea.com/2012/02/google-filter-blog-posts/#comment-423248 ), and the post appeared in Google’s Blog search.

  72. Hi Donnie Lee

    I agree with you that the purpose of Google Blog search is probably to show people fresher content, and it’s likely that is more important than the number of links to a blog post. Blog posts can still show up in Google search results, and links should play a more important role there.

    It’s possible that Google could (or may) look at things like social shares these days the way that they might have been considering looking at links back in 2005 when they were probably writing the patent.

  73. Hi Eliseo,

    A link on your own domain to useful, relevant, and related information probably shouldn’t cause you any problems whatsoever. A link on a blog post about cooking to a page on your site with a review of a hosting program and an affiliate link to that program might potentially be a different story. :)

  74. The thing that bugs me about this is that Google comes out with these standards and people follow them, thus allowing Google to control the web and shape its content. I cannot believe they’d consider filtering posts that have links to one’s own content!

  75. Hi Marisa,

    These are not standards, and should not be treated as standards. Regardless of what the patent’s descriptions might say, there’s no guarantee that Google will use these particular rules to filter out blogs from Google’s Blog Search. The more important thing to know is that they do likely have some kind of rules in place to filter out splogs, and blogs that they think are manipulative.

  76. I was just using Google Blog Search to find blogs that talk about Google+ I was actually really disappointed in my experience when I selected “Blog Hompages for Google Plus”. Yes, some of the results where OK but the others where just terrible. Now this could be that there really aren’t that many good blogs on the subject but I was just expecting more.

  77. Hi Chris,

    I can’t say that I’ve been very impressed with Google’s blog search either. I’m not sure that it’s getting the attention that other efforts at Google are, and I wish that it was.

  78. It sounds like most of the blog posts on the internet should be rejected, basing that off of the criteria listed in the flow chart. It’s a very small window of opportunity to write the perfect blog post, according to that. Very interesting article, thanks for sharing!

  79. Hi Molly,

    I agree. I don’t think that Google is using many of the things they included in the patent description in quite they way that they are described. It would probably exclude too many blogs and blog posts. But I do think that they do some filtering, though the criteria they use is probably much more sophisticated.

  80. Hi Bill,
    A good SEO won’t get kicked by Google. so I guess this just makes the difference between good or bad Internet marketers.
    Thank you for your dedication in seobythesea. It is really extraordinary.

    Ric

  81. Hi Rick,

    Thank you. I’m not sure that this patent really has much to do with internet marketers or how they might be treated by Google.

    It does give us a window into how Google might like to see their blog search being developed, but unfortunately I get the sense that blog search really hasn’t been much of a priority for them.

  82. Thanks for bringing these points to light Bill. This update will be a huge help for bloggers who actually take the time and effort to develop good non-spammy content. I’m excited to see where Google will be going with this.

  83. Hi Keith,

    Not exactly sure of the timing of the methods described in the patent, and if it is an update that Google has pursued, or tried and turned to something new, or hasn’t even implemented yet. But I agree about the value of good content in blogs, and how that can be extremely helpful both for visitors, and for how you might be ranked in search.

  84. Writing good $ unique content not only help your blog rising in search result but it also help in increasing leadership.Don’t stole someone’s article write your own and add a fresh information. keep in mind that whether you have a good knowledge about a post which you are going to written.
    Thanks for telling the procedure of Filter of bog posts

  85. Bill,

    I think Google should should develop an algorithm to identify the low quality guest posts just written to get the backlinks. Now a days lots of SEO forums are doing it to build links for the clients. Dont you think?

  86. I’m new to blogging and was only starting to get my head around it, but I suppose that’s the way of the web, it’s malleable and nothing stays the same. Thanks for the information, I hope Google still gives us newbies a chance to build a blog although I can see that this is going to shake things up!

Comments are closed.