In March, one of the more interesting patent filings from Google was granted, Information retrieval based on historical data.
I had discussed it on forums when the original patent application came out in March of 2005, but didn’t provide a write-up of the document here. I realized a few weeks ago that I probably should.
The historical data patent is important because it discusses many techniques that a search engine might use in fighting “spamming techniques” that might artificially “inflate” the rankings of websites. It works to identify “stale” sites that may be ranked higher than fresher sites containing more recently updated information.
I’ll be writing a few posts over the next few weeks about the patent, and try to include some updates that have happened since it was first published. This first post looks at how the “freshness” of a page or document might influence its rankings in search results.
Fresh and Stale Web Pages
How does a search engine tell how fresh a web page might be, or how stale it is? What do those words even mean? Why is it important? What difference does the age of a page make? Does staleness or freshness depend upon the content on a page?
The Constitution of the United States is an old document, but it’s not stale. A news article about the “World Series” from 1918 may not be what a baseball fan wants to see when searching for the “World Series” this October.
While Babe Ruth is well known as a feared slugger for the New York Yankees, he’s not as well remembered from his earlier days as a Boston Red Sox pitcher who threw a shutout in that 1918 World Series. Interesting information, but again, not what a searcher is likely to be looking for in an October 2008 search for “World Series.”
How do we tell the age of a document, and determine whether or not it is stale? What types of things would be used to give a score to a document based upon that age?
A search engine might look at information from different sources to learn about:
- The age of a document
- The age of links leading to and from that document
2. Determining the Age of a Document
The history of a document, such as its age and information about links to it, can influence ranking scores under this historical data patent. A search engine needs a starting date for a document, also referred to as a document inception date.
A search engine might look at the following to to decide how old a page might be:
- When It is first crawled by the search engine
- When it is first submitted to the search engine
- When the search engine first discovers a link to the document
- When the Domain was registered
- When the page was first referenced in another document
- When a document first reaches a certain number of pages
- By the time stamp of the document on the server it is hosted upon
Under a link-based ranking system that doesn’t use age-based information, a document with fewer links to and from the document may rank lower than a document with more links to and from it.
In a system that does use age-based information, if a document with fewer links can be determined to be newer, based upon the document inception date, it could possibly rank higher than an older document that has more links if it has a higher rate of growth of links.
But too many links, coming too quickly to the newer document may also be a sign that some type of spamming is happening (See how Yahoo may handle this issue).
So, how is that rate determined, and how much does it influence the overall ranking of a page?
This complicated-looking formula is given as one way of determining that how the age of a document might influence how it ranks:
where H may refer to the history-adjusted link score, L may refer to the link score given to the document, which can be derived using any known link scoring technique (e.g., the scoring technique described in U.S. Pat. No. 6,285,999) that assigns a score to a document based on links to/from the document, and F may refer to elapsed time measured from the inception date associated with the document (or a window within this period).
The patent referred to in that quote is one of the original PageRank patents – Method for node ranking in a linked database. This method of influencing rankings could adjust how PageRank might impact search results.
The historical data patent explains that sometimes some “older documents may be more favorable than newer ones” and that some sets of results can be fairly mature. The scores of documents can be influenced (positively or negatively) by the difference between the document’s age, and the average age of documents resulting from a query.
So, a fairly new site that appears amongst a set of results that are, on the average, fairly old, may find that difference negatively influences it in age.
Other Ways of Determining Freshness
Since the patent application was published, a new patent filing came out from one of the inventors listed on the original, Systems and methods for determining document freshness
This follow-up patent application from Monika Henzinger added another way of looking at how fresh a document might be. It takes a look at how fresh the pages and links pointing to a document are to determine the freshness of that document.
Google’s patent application, Interleaving Search Results, on how it blends different kinds of search results (Universal Search) mentioned freshness as one of the ranking factors for News Stories that it might include in search results. More about that patent filing in How Google Universal Search and Blended Results May Work.
A New York Times article from last year, Google Keeps Tweaking Its Search Engine, uncovered a Google initiative that goes by the name QDF, or Quality Deserves Freshness. It discusses whether topics are “hot,” and whether people are writing about those topics in the news, in blogs, and whether searchers are looking for information about those in searches.
Looking at user behavior and click-throughs to pages are other ways of determining whether a document is fresh or stale. The patent includes those as other ways of determining just how fresh pages might be. I’ll address those topics in a future post.
Should freshness be part of how a search engine ranks pages? If you run a website, how fresh are the pages of your site, and how can you make them fresher if they seem on the stale side?
Part 2 of this post is: Updating Google’s Historical Data Patent, Part 2 – Changing Content
35 thoughts on “Updating Google’s Historical Data Patent, Part 1 – Freshness”
Great article. The analogies and tying some of the search patent pieces together make for one of the most informative posts i’ve read in awhile.
It think it is a topic that’s worth revisiting, and exploring in a few different ways. It covers such a wide range of topics, from the weight of links, to the impact of anchor text, to how click-throughs might influence rankings, and more, that it seemed to make sense to break it into parts.
I’m a little sad to see summer end, but Autumn seems like it’s going to bring some interesting days ahead. I hope your’s is enjoyable, and I suspect that you’ll miss having the kids around. 🙂
Great stuff Bill…. I covered historical ranking factors in 3-4 posts this year. It is certainly an area that is not often talked about in SEO circles BUT should be.
Can’t wait for the next posts in the series ;0)
I hope the summer went well for you my friend, me? Kids are back in school next week… yaaaaaaay
Great post Bill. I’m always appreciative of SEOs who spend their time studying the more complex stuff like your freshness post, and then are open enough to share it with the community.
It pains me to see the Babe in the wrong uniform, especially aftet the events of the last couple of days.
Good post as always Bill.
I do think the idea of freshness should play a role in search results, though naturally it should depend on the query. I suspect it will be easier to organize the results based on freshness than it will be to understand a queries intent should lead to a freshness based result.
Then again I’ll often type a year into a query because I do want something fresh and still find some very old results. Probably picking up a copyright date or something similar.
You’re right this topic can be explored and expanded into a wider range of topics just for the link analysis alone.
When I found the picture of Babe Ruth, the Red Sox uniform looked pretty out of place to me too, even though it was exactly what I was hoping to find.
Experimenting with freshness in searches leads to some interesting results. A Google search for “superbowl” shows a lot of 2008 and 2009 results with many of the 2009 results being pages selling tickets for the event. A Google search for “presidential elections” shows some 2008 specific results, some “fresh” news stories in the middle of the search results, and some history/reference resources that look pretty up to date with an exception or two.
I think the impact of freshness does depend upon the query being searched for. There likely are some kinds of queries that are impacted less by “freshness” and that’s probably a good thing. It’s something to keep in mind when looking at query terms though.
There is some interesting aspects of the patent when it comes to different aspects of links and anchor text. I’m looking forward to digging deeper into those, too.
Thanks for your kind words. I’m hoping that asking lots of questions, trying to provide some answers, and looking at some new patent filings and articles and blog posts that might seem to be related can spur some discussion.
And that it can provide some ideas for people who visit here regardless of whether they are marketers, site owners, or people who just have an interest in search and want to understand a little more about what might be happening after they type something into a search box at a search engine and then receive a set of search results.
So many of us rely upon search engines, and yet we don’t know what goes into the delivery of the results that we see in response to our searches. Freshness is an important aspect of how relevant results might be, but it’s something that isn’t discussed much. It probably should be talked about more. 🙂
Would freshness also be determined by whether the actual content on the page has changed? For websites like blogs and news pages this kind of freshness is definitely a good factor for determining a page’s usefulness… But some pages may just be “done” and it’s content doesn’t need to be updated or freshened up.
I wonder how much of the content has to be refreshed to make it more “appealing” to the Search Engines.
Sure some parts of a web site are likely to hardly ever update, but other portions of the site could be “freshed” up from time to time.. Do you think this would help or are they more interested in seeing more new content appearing on th site and the oder content not getting refreshed at all..
That is one of the topics that I will be addressing in a future post on the patent. Changes in content are something that a search engine does consider, and a search engine may look at content on different topics differently based upon the content being considered. The genre, to use that term, like “blogs” or “news” may be important. It’s interesting that Google is now showing dates of many blog posts in search results.
I think that the importance of freshness of content may have to be viewed not only in the context of an individual page or site, but also in its framework – the other pages and documents on the Web that may involve similar content.
Great post as always Bill!
I do think the idea of freshness may have some impact on search rankings. However, the strength of a link seems to grow by time, and many old pages ranks very well.
Some sites go stale naturally. I’m sure there will be blogs about the 2008 election that will hibernate and I’m pretty sure that people in 2016 will search for “why McCain lost to Obama in 2008” and I hope that the top results are blogs and not wikipedia.
Nice stuff bill…
I don’t think that search engines gives higher waitage for age of domain.But i agree that search engines gives higher waitage to fresh content.
I have observed many times that fresh pages ranks first… Later those pages moves down the results.. We can maintain a particular position in search engines only by link building…
Thank you bill…
waiting for next post 🙂
One observation about the time stamp on a web page – Web pages that are created with PHP ( or some other server-side scripting language ) will always “look” fresher than static HTML pages.
Pages served up by PHP etc. will always have the date and time on them when the page was loaded or crawled, since they are dynamically generated by the server at the time of request.
Whereas HTML pages will have a date and time stamp based on the last time the page was uploaded to the server, possibly making the page look “stale” to a search engine like Google or Yahoo.
It seems possible that pages created with PHP etc., may have an unfair advantage with this “freshness” model over static HTML, even though the actual content of the page may not be fresh at all.
Hi Denver SEO,
Yes, I agree about some sites going stale naturally. I do hope those results you mention are from blogs from 2008 also, instead of a history rewritten for the wikipedia, eight years from now.
I have some doubts about Google providing greater rankings based upon the age of a domain, too.
Freshness does seem to play a role, especially for topics that seem to be very popular.
Hi People Finder,
The time stamp issue is one that could cause problems, which is why the patent filing will consider some other indications of time, and why the followup patent application from Monika Henzinger provides an alternative approach which looks at the age of sites and links to a page, instead of a timestamp upon that page.
Helpful article. I like “Other Ways of Determining Freshness” post from above article. and i would like to give ans of your last question > ans: Yes definitely.
Bill, I am glad you made the comment about the Constitution. I get questions all the time about whether frequent changes on a page are good for ranking. My answer is always an unequivocal “maybe”. I suspect that for certain verticals, it might be helpful, but it shouldn’t be. A page should be considered fresh if new links are still coming in. In fact, unchanged text and fresh links is probably the ultimate signal of authority.
It is interesting to see some of the other news that has come out after the historical data patent was published. No doubt that freshness is playing an important role in what search engines are trying to do, and it may be essential that they find a way to filter search results by considerations like that in addition to “relevance” for a query submitted. If the search results that they show to searchers aren’t “fresh” then those searchers may look elsewhere.
I think that a “maybe” is often the best answer for many questions regarding what a search engine is doing, and will do in the future. Having said that, I think you make a very good point about a page that hasn’t changed much over time and is still getting new links pointed towards it – that seems to be a decent indication that people find the content on that page to be of lasting and timely value.
I donâ€™t feel that search engines will gives higher rankings for age of domain. But the freshness of contents i really do agree.
Like whenever i post fresh contents on the site. Its does affect the better rankings. and i think that rankings of domain will also be determine by number of qaulity backlinks to the site incomming.
if serach engine does gives higher rankings for domain. Then would just simply get a high search keyword domain and let it stays there. without contents? And the domain will generate rankings????
Waiting for your reply
I’m not certain at a search engine gives higher rankings based upon the age of a domain either, but there may be something in the value of a history of content, and of the acquisition of links pointing to the pages of a domain over time, and other indications of being established that may aid it.
We know that one aspect of ranking – the link analysis part – requires getting links to a page – those provide a search engine with an idea of how important a page might be (on the assumption that if important pages link to another page, it’s likely the page they are pointing to is important, too). While it’s possible that people may point links to empty domains, it’s more likely that they will link to a page that contains content that they find interesting or useful or helpful or controversial or worth returning to for one reason or another. An empty page won’t do that.
We also know that search engines will consider a wide variety of ranking signals, including visible ones such as content upon the pages themselves. While a domain name may help, especially if people link to a page using the domain name in the anchor text to the links, by itself it may not produce a very strong signal to search engines that it should be considered relevant for a query that someone is searching for, regardless of how long it may have been registered as a domain.
Thanks for your reply. I have well understooded the reply of yours. But there is actually no statics proof to show that Search engine do or does provide higher domain tanking. Yes i do agree that domain over the years of usage even if its just static. They do have PR ranking if the domain is of high key word search.
I too agree that content is a part of the most important part of seo and ranking in google. Especailly achor texting with high Pr sites. Does affect alot of the site in ranking. This backlink stuff i highly belive it really and does gives the power juice of PR that links to the site from a higher PR site.
just for fun of it. I will try and get a high keyword search domain. And just do nothing but leave just some keyword to the domains. Then over time like says 2years. Let maybe take real test on this. As i highly do not really belive that serach engine does provide higher domain ranking.
Unless we do have statics proof that there is real domains that has this kind of ranking of domians just based on serach engines.
waiting for your kind reply. I formost i mean no offence Mr Bill
Anyway…I love your blog and i love reading it. Just that there is actually too many pages and links for me to read from day to nite..Cheers.
Thank you for your kind words, and thoughtful comments.
There are so many ranking signals for search engines, that isolating one factor, like the age of a domain, may not be something that can be realistically done. Testing in an environment where controls can’t be put into place, and where there are so many factors that we are unaware of may make such an experiment virtually impossible. 🙁
One would imagine that dates in URLs could be a useful indicator of document age too.
Thanks. We take a lot of signals for granted that maybe we shouldn’t.
There are CMS systems that do include dates in URLs, and it’s possible that those might be viewed as an indication of document ages, much like post dates in blogs, and the dates associated with edits in wiki software.
One concern I would have with looking at a date expressed in a URL (or blog post date) would be that someone could come in and change the content of such a page, and the date wouldn’t be an accurate indication of the age of the content on the page. But you raise a good point – we may not anticipate the form of some information that could provide us with dates and times that something might be created.
A very informative article thankyou – again. I’m not sure if freshness should be a huge consideration, I’ve done websites before for customers who want to promote their shop / restaurant / pub. The sites usually have a page describing the business, an overview of their services and a contact us page – maybe a menu if they serve food. Sometimes this is used where an ecommerce site isn’t necessary or wanted.
If Google made this play a heavy part on their algorithm then it wouldn’t help those sites very much yet they are there to serve a purpose and probably are of a lot of use to people.
It is possible that freshness plays more of a role for some kinds of sites than others.
For example, a page about sports news might be one that should be boosted when it contains fresh information, and a page about a restaurant might not, or at least freshness might not carry as much weight.
Comments are closed.