Back in 2007, I wrote about a Yahoo patent describing how Yahoo! might crawl a webpage, and then recrawl the same page around a minute later to see if any of the links on the page had changed. It might do that to try to identify what it called “Transient Links,” or links that pointing to things like advertisements that might change on every visit to a page, which aren’t links that the search engine would want to crawl and index. The post is A Yahoo Approach to Avoid Crawling Advertisement and Session Tracking Links.
Google was granted a patent this week on a similar topic that looks at “transient” content on web pages. While this kind of content might include advertisements as well, that change regularly on return visits to page, it could also include things like current weather forecasts (Warrenton, Virginia, 40 degrees and cloudy) for example. That kind of content changes on a regular basis, but often has little to actually do with content found elsewhere on a page.
Google would want to be able to identify transient content so that it wouldn’t index pages based upon it, and it wouldn’t show advertisements that focus upon it either.
Content and Tokens
Instead of looking at URLs on pages like the Yahoo patent does, this approach might look at the actual HTML code on pages and break it down into tokens. The search engine might then use a hash approach to identify each of the tokens and use a fingerprint approach to find identical hashes (or tokens). For example:
The HTML code for a page might look like this one day:
<html><header><title>Hello</title></header><body> <h1>First section</h1> <p> <em> Today is Sunday, June 24th, 2007. </em> </p> </body> </html>
A second version of the same page might be retrieved by the search engine the next day with a few changes, like the following:
<html> <header> <title>Hello</title> </header> <body> <h1>First section</h1> <p> <em> Today is Monday, June 25th, 2007. Weather Forecast Sunny. </em> </p> </body> </html>
The search engine might break the markup language for the first version of web page into tokens as follows:
1: <html>
2: <header>
3: <title>
4: Hello
5: </title>
6: </header>
7: <body>
8: <h1>
9: First section
10: </h1>
11: <p>
12: <em>
13: Today is Sunday, June 24th, 2007.
14: </em>
15: </p>
16: </body>
17: </html>
It might break the second day’s markup language into very similar tokens:
1: <html>
2: <header>
3: <title>
4: Hello
5: </title>
6: </header>
7: <body>
8: <h1> 9: First section
10: </h1>
11: <p>
12: <em>
13: Today is Monday, June 25th, 2007.
14: Weather Forecast: Sunny.
15: </em>
16: </p>
17: </body>
18: </html>
These might then be processed so that they are in a data table and can be compared quickly to see what has changed, and what hasn’t.
Google may use those comparisons to determine that some of the content on a page changes regularly, but doesn’t impact the rest of the content on that page. If so, it might decide that the content is “transient.”
Google may also look at where certain content appears within HTML paths on the different pages of a website as well, to attempt to find transient content that might re-occur on multiple pages. An HTML “path” might be something like <html>><body><div><ul><li>, where specific content might be after a number of open HTML tags, as if in a “path.” If content has been identified as transient appears at a certain path on one page, and a number of other pages have the same HTML paths, the content on those other pages might be analyzed to see if it it transient as well.
The patent is:
Identifying transient portions of web pages
Invented by Eran Gabber, Michael Flaster, Ruoming Pang, Shanmugavelayutham Muthukrishnan
Assigned to Google
US Patent 8,086,953
Granted December 27, 2011
Filed: December 19, 2008
Abstract
Systems, methods and computer readable media for identifying transient content in web pages. Transient content can be identified, for example, by parsing different versions of the same web page into tokens, and inserting fingerprints associated with the tokens into data structures. The data structures can be compared to each other to identify differences between the web pages, thereby identifying transient content associated with the web pages.
Take-Aways
Google may also look for boilerplate content on a page that is often the same for more than one page of a site, and may be the same for all pages. That boilerplate information might include copyright notices, navigation bars and sidebars. text and other information in sidebars and footers and headers that might be the same from one page to another. Boilerplate information might not be weighted as highly as main content information on a page that changes from page to page, when it comes to indexing that content in search results. Transient content may be within the same areas as boilerplate content but they differ from one another.
Unlike boilerplate, “transient” content might change from one visit to another by a search engine crawling program, and could as easily be contained in a main content area of a page as well as other sections such as a heading section or footer or sidebar. It might include things like time and data and weather information, or advertisements or other content that isn’t necessarily really relevant to content on a page that remains relatively static from one visit to another.
Transient content might also be content that is relevant to the rest of the content on your pages, and there’s a question about how Google might treat that.
I’m sure there are probably ways to distinguish between content like weather updates that might update daily, and “featured” product descriptions that might link to deeper pages on a site and update daily or hourly, but the patent doesn’t really distinguish between the different types of content and how it might tell the two apart. Some sites feature other content, such as revolving testimonials or revolving quotes or definitions that might change at random, and there’s no indication from Google within the patent as to how it might handle that as well.
If content includes links to and descriptions of “featured” products, and those are random or update very quickly, that might increase the chance that Google sees that content as transient. If you want to show testimonials and have them indexed, the best approach might be showing some testimonials that don’t change along with a link to a fuller testimonial page.
Of course, it doesn’t hurt to test these kinds of things either, if you have the time, ability, and inclination. We don’t know if Google has implemented this “transient” content approach, and if they have, what features and limitations and controls they might have placed upon it.
The patent also doesn’t discuss sites like news site or blogs which might use very similar HTML paths to content like fresh news articles or blog posts that might change as well, but there are likely ways to distinguish that type of transient content from transient content that the search engine does want to index.
Hi Bill,
Thanks for a good article. Its a quite interesting topic.
From what you know, would Google be able to use this for identifying sites where news occur (and get edited frequently) to help indexing relevant news stories faster/better? Hence staying on top of world crisis’?
– Michael
Hooooly Smokes!!! Bill, I actually do this on my sites. Well, looks like that gig is up. Thanks for the heads-up.
You know Bill, posts like these is why I read your blog. Matt Cutts was right. This is the best SEO blog around.
Please keep writing.
Mark (A loyal reader) 🙂
Bill, thanks again for sharing. My question is: Don’t you think this could be used as the the most effective weapon of Google against paid links?
Hi Bill,
Thanks for the great info. Correct me if i am wrong but it seems that somewhat this patent is hinting towards crawling links that are template driven. For example links like Top Navigation or side navigation which are present on each and every page need not be crawled again and again. This could also mean that crawlers will be able to devote more time to crawling other pages. As far as affiliate links are concerned it would be very interesting to know Google’s approach towards adsense as compared to other third party platforms.
– Sajeet
Hi Bill,
Very interesting article. It really gives a taste of where we are and how complex things becomes to be…
All the more reason to be careful where important links are placed (or more importantly not placed).
If sidebar and top nav links risk being classed as tansient or boilerplate as suggested by Sajeet will it have a negative effect on template based sites and encourage more use of important links within the bodytext? (as a lot of people do already)
I personaly can’t really see why nav and sidebar links would be classed as transient unless they are rotating or randomly displayed, but who knows? If they are in the same blocks as transient links they may be effected.
Really interesting and it makes sense to me, I’m surprised Google hasn’t done this before in some way or another.
I like how you are explaining this Terry. I get what you’re saying and agree that if they’re in the same blocks, then they really may get affected.
Thanks for explaining the concepts of transient and boilerplate contents, totally new to me.
Happy SEO in 2012 to all the readers of this blog! Without a doubt, in 2012 it will still be a benchmark for all blogs on SEO.
Very helpful information. One of the things I also learned about SEO based on an article I read, people should not focus too much on the meta tags but more on the title and the description tags. But I certainly agree with you, on this day and age, there is much pressure to learn and be good at SEO to be successful in marketing online.
Thanks for explaining transient and boilerplate contents.
A penny dropped for something I heard to get around this but didn’t know where it fit into the bigger picture until reading your blog.
That was to use Java for rotating Ad or similar content as Google doesn’t read the Jave, so your site doesn’t get punished for it.
as usual, you ask and answer the kind of question that usually get overlooked.
here are my immediate questions about Transient content.
What do you think it will mean for things like NEWS snippets on websites. I include one on my site with an rss feed from the same niche so that google sees data change daily on my pages rather than weekly. I hear that this is important for maintaining rank.
Another very interesting thing will be how it treats forum posts. I know that some black hat seo guys will just change a sig for a day or two to inflate a site’s rank then change it to the next client, which effectively changes it on the sigs for all posts by the seo firm. I also wonder if the patent will devalue forum posting for seo purposes? That will make blogs even more popular!
Brilliant topics.. thanks sharing and explain transient content, transient links and so on
To be honest the only thing I can see that Google will use it is to strengthen their ability to distinguish template from content to differentiate pages/posts from each other. Since we already know that Google treat differently site-wide links, I don’t think that we are going to see some surprising features from Google in regards to this patent. With that said, I might be short sighted 🙂
Hi Bill,
Interesting article, finding it difficult to keep up changes in HTML and SEO but your articles do make it easier.
Thanks and wish you wonderful 2012
Gulshan
Thanks for the article Bill. So are you suggesting that transient content is now taboo and that the only kind of content I should consider is permanent content related to my site?
Thanks for the great post Bill. It seems that e-commerce sites which use a lot of boilerplate content are likely to be affected by this patent. Also, it is a good thing that Google is aiming to identify pages with relevant content and pages which have a lot of affiliate content present.
Looks like 2012 is going to be an interesting year for everyone in the SEO world.
Hi Bill,
having read this interesting blog article, I am now able to understand Google`s reasoning behind this better.
Thanks!
I’m keen to see how this may affect high traffic blogs with frequently changing content.
Per eCommerce sites, not much has really changed with how the big G will view their content, since they have been able for a while to identify boilerplate content in navigation menus.
Bill,
Thanks for your reply.
Yes, I understood that from your replies. Actually, I think it is a good thing if Google will be able to really grasp the ultimate focus of the page and will not be “destructed” by the “noise”. It will enable them to serve much more targeted traffic.
Another point I saw in one of your replies, is the issue of embedded RSS feeds… if Google will decide not to index these links, it probably will have a huge impact on rankings of many sites that currently have lots of backlinks pointing from their feeds all over the web.
Hi Michael,
Thank you. There are some kinds of sites that do update frequently, like news sites and some blogs. This patent isn’t so much about identifying that kind of content and sites where that kind of information is updated very frequently, but more about identifying content that really doesn’t add anything to the main content of a page, such as a weather report that updates daily or a certain number of times a day.
When the Washington Post, for example, includes a weather forecast widget on the tops of their pages that updates frequently, it’s something that Google might learn to recognize as transient content so that it won’t include that weather forecast information when it indexes those pages and use as content that it might base advertisements upon.
The process where Google might revisit pages and recognize that there is new content to index, and learn about how frequently a site tends to update, so that it can crawl and capture fresh content is a little different. When it visits those types of pages, it might notice news lead type paragraphs that point to new news articles upon frequent revisits, and recognize that it is content that a site owner does want indexed.
Hi Mark,
Thanks for your kind words.
I don’t think there’s an absolute harm in having some transient content on your pages, but what concerns me is how easy or difficult might it be for Google to distinquish between transient content and doing things like having a list of something like random links to “featured” products or stories. I’m not sure that the patent itself is helpful in telling us how and when Google might distinquish between the two.
Hi John,
I’m not sure if this is something that might be extremely effective against paid links, or at least paid links that don’t change over time unlike links like those served by Adsense, which often point somewhere different on every visit.
Google’s reasonable surfer approach seems like one method that might diminish the impact of paid links, and Google’s tracking of content and links over time, pursuant to the historical data patent and its children might be another that can help in some ways to identify those types of links. If a link suddenly appears on a page that is embedded in a main content area for instance, and the anchor text of that link is completely unrelated to the rest of the text on the page, and the page being pointed to is a commercial landing page offering goods or a service, that could be signals worth investigating as to whether or not a link is a paid link.
Hi Sylvain,
I think things have always been more complex than most people are willing to guess at, or anticipate.
The kind of transient content that this patent filing attempt to not index or use for ads is something that would make the quality of search results not as high if it was indexed, and the content of advertisements not as relevant for pages if it was used to categorize those pages.
Hi Sajeet,
The patent doesn’t come out and say that it is looking for templates specifically, so it doesn’t limit itself to that, but it does say that it might try to understand the HTML structure of pages, and whether or not that structure is replicated on one or more page, and if the same kind of transient content is appearing on multiple pages. So, it could be used to identify transient content that appears in templates as well as in similar HTML structures on other pages, such as a widget for instance that is formatted a certain way.
Hi Terry,
One of my thoughts while reading the patent was that it might not be a good idea to include random links on a page to other pages on the Web or on your site if the text of those links, or that accompanies those links are something that you might want to have indexed. I wrote this post shortly after finding the patent, and it’s not something that I’ve had the chance to try to test over any period of time, but it’s something that might be worth testing.
I think footer, header, and sidebar content might have a greater chance of being considered boilerplate if it doesn’t change on a regular basis, and content that might appear on those areas that does change either very frequently or randomly stands a change of being identified as transient. One of the situations that I thought of while reading the patent were random testimonials that might appear almost anywhere on a page and change from visit to visit. They might not be harmful to the ranking of the page itself, but it might be possible that content could be considered transient, and might not carry much weight if any when it comes to indexing a page.
@Michael (first response)
That is a good question. My site is very dynamic, and rotates through different ‘listings’ on the homepage as they come in (new postings), so that content is changing on a daily, sometimes hourly, basis. This would fall into the same thought process as you asking about news, because it changes so quickly and often.
However, my take on it is, if Google has already figured out your site, it would be able to detect whether the content/links/whatever is changing frequently, is of use and relevance to your site, and ultimately users. If yes, it crawls/indexes as normal. If no, because it has deemed the content/links to be ads/spammy/cloaking, etc., then they will no doubt de-value it.
Anyways, we all work our butts off in vain… Google was just outed for buying links to promote Google Chrome. What a fine example they offer to their loyal webmasters community.
Hi Matt,
The patent was originally filed in 2008, and it’s possible that they might have tried an approach similar to the one described in the patent sometime in the year before it was filed. As a website owner, I might find myself a little upset if my page started getting traffic for weather reports if I included a little weather report widget at the top of the page instead of getting visitors for the content that I create. The idea does sound pretty reasonable.
Hi Eliseo,
You’re welcome. Hope you have a happy new year as well.
Hi pankaj,
I’d agree that people should focus more on titles and meta descriptions rather than meta keywords. This post really isn’t about any of those, but rather on content that might make a site more interesting, but really doesn’t add to the content that is the focus of a page, like a weather report.
Hi The Cheap Traffic Guy,
It’s probably a pretty good practice to use something like java script to display ads if possible, so that they don’t get indexed. Not completely sure if this patent would be effective with all random ads that appear on pages, but if you’re not choosing the content of the ads that are displayed, there may be a good possibility that you don’t want a search engine to index that content anyway.
Hi Bruce,
I have to say that I’m often surprised by some of the things that I run across in patent filings, and many times they involve topics that really haven’t been discussed much, and are worth spending some time on.
I think I’ve seen content from RSS news snippets and RSS twitter streams get indexed in the past by Google, but I don’t know if that’s something that could change overnight. I could see Google deciding at some point that kind of content might be transient, and if they do, it might be the kind of thing that they decide to stop indexing.
It does have some value, and displaying those feeds can sometimes be helpful to visitors, but if they mean that the other content on the page doesn’t get as much of the focus of the search engine when it comes to indexing it, then it could lead to visitors to your pages more interested in content from the newsfeed than from the rest of your page. The question is, when visitors to your page arrive based upon content from those RSS feeds, do they stick around to read your page, or do they follow the link through the RSS feed?
Hi Frank,
You’re welcome.
Hi Assaf,
You might be focusing a little too much on “templates.” While Google might find transient content in templated content, it might find it in untemplated content as well. It’s just as likely that Google might identify a widget that produces something like a weather report on a page, and use the HTML structure of that widget to identify it on other pages of the same site.
I think the important thing to keep in mind here is that if you purposefully set up some content on your pages to behave in a way where it might be perceived as transient, and you want that content indexed, it’s possible that Google might not index it.
Hi Gulshan,
Thank you very much. I hope you have a great 2012 as well.
Hi Dan,
I’m not suggesting that transient content will harm you in any way.
But this patent does seem to warn that if you have content on your pages that does appear to be transient, Google may not index that content or use it to decide which advertisements to display.
If the transient content is something like a weather report, which might be completely unrelated to the rest of the content of your page, that’s actually probably a good thing.
If the content is something that you do want indexed, like a list of “featured” products and links to them on the rest of your site, it’s possible that Google might consider those to be transient if they change on every return visit – even a minute or so later. After reading the Yahoo patent I described in the post involving transient links, I often recommended to people that if they do something like a section on featured products, to try to change them daily or weekly instead of randomly to make it less likely that they might be mistaken as transient.
Given this idea of transient content and how Google responds to it is something that really hasn’t been discussed much until this patent was granted, it’s something that might be worth experimenting with. Google may not be using it at all, or it’s something that they could turn on next week without any notice, or it could be something that they tried and may have even abandoned to move on to something new. Either way, it might be something with experimenting with, and exploring.
Hi Anindita,
I’m not convinced that transient content should be looked at in the same way as boilerplate content. The term “boilerplate” implies content that is the same everytime you visit, and that might not have much to do with the content of a page. It shares with transient content the idea that indexing it really won’t help a search engine deliver a searcher to a meaningful page, but transient content is identified in part by the idea that it changes so very quickly.
Hi Marko,
You’re welcome.
Hi Jeff,
I don’t think that this patent will have much of an impact on news sites and blogs that update frequently. This is more likely going to keep sites that might use widgets of some type that show random quotes or testimonials or weather reports or other content that isn’t really related to the rest of the content of a page, to having that transient content from being indexed by the search engine.
Hi Ferly,
I agree with your assessment regarding how Google may identify the structure of a site using a blog format or newspaper format with frequently update content, and treat that content as if it were actual content rather than transient content. I believe that the target is this patent is more the weather report type content that appears on pages, than it is to things like excerpts from blog posts or news articles that often link to full pages with titles that may match the titles to the excerpts, and content that is similar to the snippets shown on a main pae.
Hi Assaf,
It seems like Google hasn’t targeted RSS feeds at this point from some searching that I’ve done, and it’s possible that they might not, or at least might not under this patent filing. But yes, if they decided to, it could have a big impact on a fair number of sites.
I don’t think that its something people should panic over, but being aware of the possibility, and finding some other ways to attract some additional links might not be a bad idea just to future proof their sites against such a possibility might be worth considering.
One thing seems unclear to me: How will Google be able to differ between e.g. “most sold products” on a e commerce site that constantly change and the more “shady” transient content created to make the site seem fresh all the time?
Nice. I’m sure google already checks transient content to remove spamy, “flashing” links. Since 2009 we do not use this type of “link exchange systems”, because it do not works at all…
I had never heard of “transient content” even though we have been doing SEO for 6+ years. Wow. Google gets smarter every day. One issue with their approach might be social media feeds, which could change on every page refresh.
Thanks for sharing great article.
I am trying to understand how Google looks at updated content is interesting. I have heard people say that updating content often is something they do in the belief that it is likely to yield better rankings, however I think the lesson is that Google is working on tools to determine whether minor changes to content is really relevant to content on a page which a user is searching for. eg changes in privacy policies might not qualify as worthy of being indexed by that criteria. Your blog has a great article on how google might discern the difference between incidental content such as weather updates which change daily and something like featured product descriptions linking to deeper pages on a site updated even more frequently.
Hey Bill, so sorry I can’t give comments, because I need to translation that articles in Indonesian Language then learn. But, one thing I would love to ask you, since when you understand about SEO?
I was going to jump in and suggest that Google should have done this years ago, but with more consideration I bet it takes a shedload of processing power (and cumilitive time) to perform this. I wonder how GoogleBot knows which sites are *likely* to have transient content, and re-crawls appropriately?
It’s crazy how sophisticated these crawlers are becoming. This was the first I had heard of this, appreciate the information. I’d share your concern that dynamic links to featured items could be flagged by Google and you could be penalized without even knowing it was happening. Troubling.
Great Read! I wasn’t sure if Google actually did this, it certainly makes sense especially for link tickers and link farms that put random links on pages with rotation ext.
Going to look more into it, its very interesting indeed!
Take care,
Leon.
Hi Bill!
I’m just curious, if I interpreted this correctly, would using Genesis Simple Sidebars to vary your sidebar content, based upon the individual page content, be beneficial to avoiding transient content ‘detection?’ I’m in real estate and post pages and posts often so, I would, of course, not be able to create a different sidebar for each post/page but, if I had 20 varieties, instead of 1, I wonder if that would be helpful?
Thanks! Interesting article.
-Debe
Hi Thomas,
That’s one of my concerns as well.
Hi Greg
If you’re concerned about a page being indexed and showing up in search results for the content found on those page, having a social media feed that constantly changes can make that page less relevant for the terms that appear upon the page. So if Google decides not to index content from the display of those feeds, it might not be that bad of a thing.
If you’re relying upon that same page receiving traffic for terms that show up in the social media feed, that’s another matter. A page might rank for terms that show up in a display of the feeds, especially if they contain very timely and topical terms, and there aren’t many other pages on the Web that include the terms.
I have been seeing pages showing up in search results for terms found in displays of RSS feeds for queries that are both long tail and very recency sensitive, so it appears that Google is still looking at those. But they could potentially stop doing that at any time, and justify doing so by saying that those RSS feeds do make pages less relevant for the main content on those pages.
Hi Sarkari,
Thanks. You might also find this post on minor and major changes to web pages, and how search engines might react to those interesting:
The Impact of Content Change on Search Engine Rankings
Hi Ucup
That’s part of why I added the Google Translate widget in the top right sidebar on this site – to make it easier for people who don’t speak English to read what I write. I know Google translations aren’t perfect, but I’m hoping they help.
I started working promoting websites in 1996. There’s a little more about me here:
https://www.seobythesea.com/about-seo-by-the-sea/
Thank you.
Hi Matt,
Doing something like a transient content check can mean that Googlebot might have to work harder and revisit pages more often.
It’s possible that Google might have some kind of predictive algorithm that might tell it that some sites might be more likely to have transient content, but I’m not sure that they do. I would guess that they might also do something like what Yahoo does with transient links at the same time, which would make revisiting pages to check those things even more worth doing.
Hi Charles,
I don’t know that you would potentially be penalized for random content and random links to featured products. But it’s possible that search engine might ignore that content, or decide not to follow those particular links because of their transient nature.
Hi Leon,
Those are a couple of very good examples of where and why Google might be on the lookout for transient content. Thanks.
Hi Debe
It looks like the Genesis Simple Sidebars plugin enables you to have unique content within your sidebar for different pages and posts. I could image that you could include transient content within those sidebars if you wanted (like random testimonials or RSS feeds from Twitter, or other changing content), but it looks like you could put content within them that doesn’t change frequently as well.
For example, if you have 10 pages on a real estate site about 10 different neighborhoods, and you decided to put some “facts” about each of the neighborhoods appropriate to those individual pages, you could use that sidebar to do it. The neighborhood 1 page could have facts about neighborhood 1 in its sidebar such as the schools in the area, the average price of homes, if there are parks nearby, and so on. The neighborhood 2 page could have facts about neighborhood 2 in its sidebar, and the neighborhood 3 page could display neighborhood 3 facts.
That’s not transient content in that visitors (and search engines) keep on seeing the neighborhood 3 facts when they visit the neighborhood 3 page. It doesn’t change on every visit.
Hey Bill,
First of all let me start with this is a very informative blog, I’m kind of new to doing websites and my blog, so I’ve been trying to learn some of the factors that my website might be evaluated by. So I was wondering since I’m in real estate could my house listings page be considered as transient content? The listing do change and sometimes frequently. So seeing this blog it made me wonder whether or not that would affect my website. As my business is to sell houses it is important to me for Google to see this information as relevant. is there a way to mark my listings as “not transient”?
First time on your Blog. Interesting article you wrote.
For me Google method for identifying ‘transient content’ is more robust than
Yahoo’s. But Yahoo’s method is old and I doubt that they are still using the same
strategy since 2007.
Boutros.
Hi Matt,
Thank you. I don’t think that this approach from Google is intended to impact house listings, or ecommerce product listings that might be listed one day, and then removed the next.
Instead, they are looking at things like snippets of text that might appear on a page, such as the random display of testimonials, or a weather indicator at the top of a page that might update a few times a day or even possibly daily.
Google hasn’t officially announced anywhere that they are actually doing this review of transient content as far as I know, except in this patent granted to them. I haven’t seen anything from them in a help page or a blog post that actually covers this, and a method to mark something as “not transient.” That doesn’t mean that they aren’t doing something like this.
But I would suspect that if they are, that your listings of houses wouldn’t be the kind of thing that they would want to impact with it.
Hi Boutros,
Happy to hear that you found this post interesting.
The Yahoo approach was for a slightly different purpose. It looked at links instead of content, and would try to determine whether a link was transitory or not by possibly recrawling a page within a short time period after a first visit to see if any links had changed. The idea is that links that tend to change that quickly are often to pages that Yahoo might not want to crawl, such as advertisements. We don’t know for certain whether Yahoo actually used that method, but it seems like a reasonable assumption to make on their part.
I think when dealing with fresh content, the first thing is to figure out if the queries you are dealing with favours older documents before new ones. If they don’t, then you may not have to deal that much with the freshness factors. The importance lies then in making sure that the content is relevant over time and always usefull for the visitors.
Nice reading.
Even though we don’t know if this “transient” content approach has been implemented, it’s a nice thing to know, and be aware of.
I thought about if whether it could also be considered “transient” content, if a module on website shuffles different firms with company information etc. in a random order?