Tomorrow the footers of a great number of websites will automatically change to show a new copyright date. Others will wait for site owners to manually code the change. It’s a change worth making, because it shows visitors that the sites are maintained and up-to-date. As changes go though, it’s a fairly insignificant change, and likely won’t have much influence on the rankings of pages in search results. Many pages on the Web change in minor ways everyday, including updates to visitor counters, subtle changes in formatting, and new advertisements shown on pages.
Many other web pages change in more significant ways on a regular basis, from blog home pages that show new posts, to news media sites that might add new storylines every 15 minutes, to social sites that constantly change as multitudes add updates.
How frequently a search engine crawler might visit a particular page on the Web can depend in part upon how often the page is updated. For example, a news site, updating every hour might have Googlebot or MSNbot sniffing around hourly to devour new content.
It might be an easy assumption to make that when a search engine crawls and indexes that new content, it’s only looking at the content that exists at the time of a visit, and not accounting for how much change has actually happened on a page since the last visit. But what if search engines pay attention to the frequency of changes to pages, and record the amount and type of content that changes?
What if a search engine tracked these changes, and the changes themselves influenced rankings?
A recently published patent application from Microsoft refers to the tracking of changes in documents over periods of time as “Temporal Dynamics,” and looks specifically at changes to things such as:
- Terms included in or associated with a document,
- Anchor text in a document,
- Colors and sizes of images,
- Tags assigned to documents,
- The positions of text or images,
- Queries used to retrieve the page,
- Amount (volume) that a document changes over time,
- Frequency/rate that a document changes over time,
- Nature of changes made to the document,
- Other changes that may occur over time
One place where this information might be used may depend upon whether or not a query used in a search is deemed informational or navigational.
An informational query is one where there may be an intent by a searcher to find information on a topic, such as “How do I add drop shadows to words using CSS?”
A navigational query is one where someone is searching for a particular page, such as typing “Hilton” into a search box to find the Hilton Hotels homepage.
When someone is looking for information about recent events or something fairly new, it might benefit a searcher for a search engine to show a page where new terms have been recently entered into the vocabulary of the document. On my example informational query above (“How do I add drop shadows to words using CSS?”), that might mean that a page that recently added the phrase “CSS 3.0” might be boosted in search results.
For navigational queries, we’re told that a page with content that hasn’t changed substantially over a period of time might be boosted in search results. I’m not sure how well that works with news and media sites where the content changes on a regular bases, such as a navigational query for ESPN or NYTIMES, and the patent filing doesn’t seem to address that issue.
The patent is:
Assigning Relevance Weights Based on Temporal Dynamics
Invented by Susan T. Dumais, Jonathan Louis Elsas, and Daniel John Liebling
Assigned to Microsoft
US Patent Application 20100325131
Published December 23, 2010
Filed: June 22, 2009
A system described herein includes a receiver component that receives a first dataset, wherein the first dataset comprises temporal dynamics pertaining to a document that is accessible by a search engine, wherein the temporal dynamics comprise an identity of a term corresponding to the document and an indication that the term has been subject to change over time. The system also includes a weight assignor component that assigns a relevance weight to the document based at least in part upon the temporal dynamics pertaining to the document, wherein the relevance weight is utilized by the search engine to assign a ranking to the document with respect to at least one other document when the search engine retrieves the document.
The patent provides a much more detailed description of a process that could be used to track changes to a web page, and use it to influence the rankings of search results.
What I found interesting wasn’t so much Microsoft’s process itself, but rather that they might capture information about changes to a page, and possibly use that information to influence search results.
Add a new keyword phrase to a page’s title and heading and a sentence or two of the content, and a search engine tracks those changes. Add a couple of new pictures to an old blog post, and the search engine makes a note of it. Completely rewrite the content of a page a couple of times in a short period, and the search engine might decide that the page is more appropriate for informational queries than navigational ones.
While this is Microsoft’s patent, it might make sense for Google to consider carefully how pages change over time as well. It seems the search engines may not just look at a webpage as it exists today, but may have a memory that considers what a page was like in the past, and how it may have changed.
I could see change information about pages being used in a number of ways by a search engine that go beyond deciding whether or not a page is more informational or navigational in nature. Changes to a page might signal a change in ownership of a site, a possible intent to spam, an attempt to provide new information and make a page fresher, and other things as well.
When you make significant changes to a web page, what signal might you be sending to the search engines?
Oh, and Happy New Years – may all your changes over the coming year be good ones.
Added (12/31/2010 at 12:09 pm): A couple of days ago, I received an email offering me $125 to add some text and a link to a commercial page on a six month old blog post on another blog. How much would a change like that stand out, if a search engine not only indexed a page according to its present state, but also looked carefully at a history of changes to a page?