A new patent application (a rare short one) from Monika Henzinger, of Google, adds a way to consider the freshness of a web page, based upon both the “last-modified-since” message received by a search spider about the page, and a review of the “last-modified-since” messages received from pages that link to that page.
There are other examples of how this works, but here’s one from the patent application:
As another example, if the number of “fresh” documents of the set of documents containing links to document p is greater than the number of “not fresh” documents of the set of documents containing links to documents (i.e., as determined by freshness attribute(s) associated with each document of the set of documents), then documents can be considered “fresh,” and a corresponding “high” freshness score F.sub.r may be assigned to documents. To illustrate, if each document of set of 100 documents containing a link to document p has a freshness attribute, such as, for example, a HTTP “last-modified-since” attribute, that indicates that 70 of the documents have been recently modified or updated and, thus, are fresh, then a “high” freshness score F.sub.r can be assigned to document p.
Needless to say, fresh documents may move up higher in rankings than documents that aren’t fresh.
This patent application is a continuation of Google’s patent application Information Retrieval Based on Historical Data