In 2002, Jon Kleinberg wrote a paper about looking at how frequently terms and phrases might appear in the emails he received or the news articles he read and how some terms would suddenly become popular over hours or days and lose that popularity. He named this after a networking term, referred to as burstiness.
For example, as a professor, he would receive a lot more emails that contained the word “prelim” in the few days before midterm exams.
The paper is Bursty and Hierarchical Structure in Streams, and paying attention to bursts of activity related to certain terms, like those described in the paper might tell us something interesting about the times that certain buzzwords became more popular.
Imagine taking this idea of the burstiness of phrases appearing in emails or news articles and instead looked for burstiness or phrases appearing in search queries at a search engine. Would pages that include phrases that have suddenly become more popular in searches over a short period of time be pages that searchers might be more interested in seeing?
When people search for different query terms, a search engine can track how frequently those terms are searched for in search logs that record the number of searches at the search engine for different terms. If there has been an increase or decrease in the frequency of searches for that query term, that increase or decrease in popularity could be noted.
Pages with terms that are searched for frequently and terms that have seen an upsurge of searches over a period of days might be given a keyword usage score that could increase how well those pages rank in search results. Those popular terms could also be recommended to advertisers at a search engine to take advantage of the increased popularity of the terms.
Those uses are explored in a recent patent application from Microsoft.
The search query burstiness patent application is:
Keyword Usage Score Based upon Frequency Impulse and Frequency Weight
Invented by Hua-Jun Zeng, Hua Li, Jian Hu, Han Peng, Zheng Chen, and Jian Wang
Assigned to Microsoft
US Patent Application 20080301117
Published December 4, 2008
Filed June 1, 2007
Abstract
A method and system for assessing keyword usage based on the frequency of usage keywords during various periods are provided.
A keyword usage measurement system is provided with the frequency of keywords during various periods. The measurement system then calculates a recent usage score for a keyword by combining a frequency impulse score for the keyword with a frequency weight.
The frequency impulse score for a keyword indicates whether a recent change in the frequency of the keyword has occurred. The frequency weight for a keyword indicates a recent measure of the frequency of the keyword.
Examples of Increased Rankings for Pages Containing Bursty Search Terms
Imagine that there have been many recent searches at a search engine for the term [New York Yankees] because of a winter player trade involving the Yankees. While many people regularly search for the team at a search engine using the phrase [New York Yankees], as rumors of the trade start, the number of searches for [New York Yankees] shows a rapid increase. Even more frequent searches are made for the term [New York Yankees] if the trade is confirmed.
During this time, when the phrase [New York Yankees] has become very popular, someone searches for the term [baseball]. Because of the high frequency and the rapid increase in searches for [New York Yankees], pages that show up in search results in a search for [baseball] that also contain many mentions of the phrase “New York Yankees” may rank higher in the search for [baseball] than they otherwise would because of the recent great popularity of the phrase [New York Yankees].
In another example, imagine that a current event, such as a devastating earthquake, happens, and a search query burstiness takes place with people searching for terms such as “seismograph,” “Richter scale,” “tsunami,” much more frequently. If there is a way to measure the rapid increase of popularity of those search terms (a search query burstiness), pages that contain those terms might be given a boost in rankings when someone searches for the phrase [earthquake].
I’m trying to figure this out. If pages containing “seismic” in them were to all be given a burst in rankings, supposedly it would be for searches involving the word “seismic”. But since all pages returned for that search include the word “seismic”, how would any of them be given a ranking advantage? I can see how they all would get more traffic, but how would any of them benefit in their actual rankings?
Hi David,
Thanks for asking. I
might have to rewritehave rewritten my examples. 🙂Pages containing the term “seismic” might be given a boost in rankings if “seismic” suddenly has a rapid increase in peoples’ searches as seen in the search engine query log files.
If someone does a search for the term [tsunami] and there are pages in the results for that search that contain the term “seismic,” those pages might be given a boost in rankings for the [tsunami] search. The pages that contain the term “seismic” more often might be given a bigger boost than the pages that don’t contain it as much.
I also look at burstiness so I have a folder of papers on it – a good one is the following:
http://cgi.di.uoa.gr/~platakis/DiscoveringHotTopicsEUREKA2008.pdf
It looks at extending Kleinberg’s solution.
I think it’s good to say that bustiness is calculated on the entire corpus rather than on a single document. A bloggraph is created – I know you mentioned this 🙂
I don’t know if it’s a good idea to be honest, just on a personal note. I say this because Google sometimes gives me the latest blog post for a topic for example, and I have to use advanced search to get the right date range. I can’t think of an example right now, but I’ll update when I come across it again!
Interesting. Along with measuring the “burstiness” of a popular and timely keyword or phrase, search engines like Google, Yahoo and MS will have to be able to crawl the web faster and faster.
I have noticed in the past few months that google has gone from crawling my pages every 1 – 2 weeks to crawling them every 2-3 days.
Its all about freshness, as you have mentioned on here before, Bill.
Thanks, CJ.
Burstiness has been written about in the context of blogs in a few papers, but I believe the use described by Microsoft’s patent filing is for all documents on the Web, and not just blogs.
Hi People Finder,
It is about freshness, though in this case maybe indirectly.
Google does seem to get some content into their index pretty fast, and I think that’s been true for a good amount of time.
This idea may result in getting fresh content to rank higher in search results, but the focus really is on getting content that has suddenly become very popular (as seen in peoples’ searches) to be more visible to searchers. That popularity may coincide with showing content that is fresh because many people may be searching for something that involves current events. Or it may involve something that is seasonal, like a holiday or a recurring event. For instance, people may start searching a lot for Christmas recipes towards the end of December.
For something like this to work well, search engines do need to recrawl pages where content tends to change rapidly, so crawling rates would need to be faster, like you’ve seen.
Bill, I think I see what you are saying, although it boggles my mind how complex that could get. I find myself wondering if I had a fashion page and I wrote that a new line of hats will “have an impact of seismic proportions” on the hat industry, would my page temporarily get a boost? Interesting thought.
But it does suggest that we should be writing copy around both keywords and themes. So if we write copy around “tsunami”, we should also make sure that words like “seismic”, “seismograph”, “Richter scale”, “waves”, “rescue”, “Red Cross”, etc. are all on the page, correct? This is IMHO a wise approach anyway, just due to the natural long-tail search permutations people are looking for.
Hi David,
It could be a complex process, but not any more complex than some of the other things mentioned in search-related patent filings over the past few years.
In some ways, this reminded me of the Google phrase-based indexing patents/patent applications, in that it would require a search engine to look at phrases that appeared in the top “n” number of search results for a query, and reorder those results based upon the co-occurence of those phrases within those results.
Both phrase-based indexing, and this burstiness approach would benefit from writing copy that included phrases that are relatively related to the keywords being optimized for a page and a theme or topic for that page. As you note, given that those can be also be beneficial from a long-tail search approach makes it a practice that may be well worth considering.
We don’t know for certain whether Google is using the processes described in the phrase-based indexing patent filings, and we don’t know if Microsoft is using this burstiness approach, but both are worth experimenting with, and even if they aren’t, the benefits of including possible long-tail terms by itself could make it something to try.
Google seems to be crawling much more often. A new site I’ve got up has been crawled 160 times by the googlebot in just the last couple weeks.
Strange.
Jayden
Jayden – 160 times in a couple of weeks? That seems a bit excessive for the Googlebot!
Not sure how unique this is for Microsoft, as this is effectively how many sites already at least partially work, but in a slightly different way.
E.g. Technorati, Techmeme and Blogcatalog
It can be exploited in the opposite direction for an SEO benefit or just to present topical information faster to visitors.
Hi Jayden,
You may have a link or two from some sites that are crawled fairly frequently by Googlebot, and that may be why you’ve received so many visits.
Hi Andy,
Thank you. Appreciate your pointing out techmeme, etc.
I think the major difference here may be that the Microsoft method is looking at query logs rather than content produced on pages, to boost pages that contain rapidly popular query terms. Those pages can even be older ones, that have been around for a while.
I could see the logic in your baseball example. With a big yankees signing some searching for “baseball news” would likely find pages about the yankees interesting.
But in the earthquake example people might be looking an actual explanation of the Richter scale.
Hi Dave,
Thanks. Good question about my earthquake example. People searching for an example of the Richter scale would probably still be able to find it. This process likely wouldn’t make information resources like that disappear, but it might boost some pages a little (for a short while, like a few days) that might be relevant to topics related to a query that has seen a rapid increase in interest by a large number of people.
Very good example for the above patent. Which means this will be more useful for the news websites, blogs and forums. I am saying this websites because these are places where we can get information for the recent happenings and where the terms related to those recent happenings will be made popular by the usage of the search term.
This will make ourselves updated with the latest news. Thanks for sharing this with us 🙂
Thank you, cdsseo .
I do think that the kinds of sites that will benefit the most from this are ones that try to keep up with timely topics, such as news sites and blogs and forums.
But even older sites, including static ones, might also been seen getting boosted in search results if the material on their pages suddenly becomes the subject of many searchers attention as shown in rapidly increasing queries for a term.
I agree with William, it would be the blogs and news sites benefiting from this. It would be difficult for other static sites to just up and change over night doing with the whim of what ever keyword is popular. Although I’m sure there are exceptions to this.
My other problem I see with it is that it’s going to be difficult to all of a sudden get ranked for one of these “popular” keywords overnight – SEO usually takes a lot of effort and many sites I’ve seen (at least on the mom and pop side of it) would not be able to successfully optimize in time before the number of searches went back down to normal.
Hi SEOsean,
News sites and blogs may be the biggest beneficiary of this approach, but static sites could benefit from this as well, if they can anticipate some of the popular keywords ahead of time.
For example, setting up pages for specific holidays or certain events, a fair amount of time ahead of that holiday or event might enable you to take advantage of “burstiness.”
An example that I’ve seen a small restaurant in my town use is to offer special dinners on their web site, and on signs in their storefront, a month or more ahead of time, for events related to the local university. They draw pretty big crowds, with reservations made a month or more ahead of time, for things like homecoming, “parents day,” and graduation. Parents coming to visit their children for those events will look up the name of the school and those events, and find the restaurant website ranking well for those terms. They’re using static pages, but they’re also planning ahead intelligently.