When you have a heavily visited web site, a portal; where people can buy and sell things, perform searches of the web, make selections of topics in a directory, create alerts on different topics, join groups, and perform many other activities, you might be able to tell a lot about the visitors to your site, and their changing interests.
Or, at least you might if your analysis of your log files, your measure of user activity, and your reporting of that activity will allow you to do so. Yahoo was granted a patent Tuesday on a monitoring system that would enable them to categorize those activities, and track the use of different topics and terms used by searchers, or clicked upon.
This kind of buzz is referred to and defined in a number of ways within the patent, including the following:
In one embodiment of a traffic monitor, events are associated with topics or terms and are grouped by category. For example, when a user provides a search server with search terms and then selects a page from search results, the resulting page hit might be associated with one or more of the search terms used. When a user arrives at a particular page after navigating a subject directory, the page hit might be associated with the subject of the navigation. By comparing changes or trends in the traffic associated with a search term or a category, the “buzz” associated with a topic, term or category can be assessed.
While hanging out at one of the Las Vegas hotel bars during Pubcon a couple of weeks ago, one of the folks who I was talking with started getting a number of phone calls from Yahoo!. It was kind of late to call, but the calls came anyway. Seems he had set up Yahoo Alerts to contact him on his mobile phone when certain information was available.
I’ve used Yahoo and Google Alerts to track certain phrases in the news, and they can be a nice way of finding information that you otherwise might not have seen. If you haven’t used them, they are worth checking into. They can be helpful in tracking keyword phrases, doing a limited amount of reputation management, getting the latest sports scores, or finding about traffic incidents on the route of your daily commute. Explore them a little, and you may find some creative uses for them.
I get my alerts by email, but I can see how it might be useful to get some on your phone.
Yahoo was granted a patent on alerts this week. Don’t know what impact, if any, that will have on services like the one offered by Google. The services offered by each search engine are different. Google provides alerts for information appearing in news, blogs, the web, and usenet (groups). Yahoo supplies news keyword results, and alerts for a wide number of other services (stock quotes, sports results, traffic congestion information, horoscopes, and more), and they aren’t all keyword based the way that Google’s are. Yahoo also provides alerts via email, instant messenger, and mobile alerts, while Google’s are limited to email.
In the late 1990′s, two versions of the original Google repository and search engine were created; a commercial one which became the Google we know today, and an academic version which was named Stanford WebBase.
In May, 2006, a number of researchers connected with Stanford University submitted a paper to the ACM Transactions on Internet Technology which describes how the WebBase system works, and many of the experimental and performance results that led to design decisions behind this system. The paper is Stanford WebBase Components and Applications (pdf).
While earlier projects used information gathered by WebBase, this is the first paper that actually looks at the WebBase system itself.
The main idea behind WebBase is to saves researchers the time and effort behind collecting Web data on their own, so that they can spend that time and effort on their research instead.
Does sorting commercial pages from informational pages provide value to searchers?
That’s one of the notions behind Yahoo’s Mindset (no longer available), which allows searchers to use a slider bar (see image at left) to rerank search results based upon whether a site is more commercial or informational. Yahoo! Mindset was released in May of 2005, and uses machine learning for text classification to sort top results for a query.
I included the kind of reorganization of search results done by Mindset in my post on 20 Ways Search Engines May Rerank Search Results back in October, but I didn’t have a link then to a patent or paper that might describe some of the processes behind how it might work. Since then, I’ve come across some interesting criticism of the use of sliders from Greg Linden, and his comparison of Yahoo Mindset with Yahoo Personalized Search.
I’ve also seen a newly released patent application from Yahoo which discusses ways to sort commercial from informational page in search results:
If you use a toolbar with your browser, I’m curious about which one and why. Some new patent applications from Yahoo have me looking a little closer at the Yahoo toolbar.
I don’t have enough screen real estate on my monitor to load too many toolbars on the browser I use. That’s partially why I split my browsing time almost equally between Firefox and Internet Explorer, with a little Opera thrown in for good measure.
Right now, on Firefox, I’m using the Web Developer Toolbar from chrispederick.com and the Stumbleupon toolbar. On Internet Explorer, I have the Google Toolbar and the AIS Web Accessibility Toolbar
The Web Developer toolbar and the Web Accessibility toolbar both have a lot of useful tools that make it easier for me to dig quickly through a website and see how it’s constructed and how it looks with and without java script and css and images. I can quickly see which links are on a page, whether or not it validates in html and css, what pages look like at different resolutions, and much more.
What are the differences between enterprise search and web search? Will developments in enterprise search someday enable search engines to be created that might index the web as well, or better than present web search engines?
IBM was granted a patent today on their Unstructured Information Management Architecture, which was made available to open source developers last summer. Sourceforge has more information about the open source nature of UIMA, as does IBM. IBM recently decided to move this open source development over to Apache.
Unstructured Information Management was the subject of an IBM Systems Journal in 2004, which contains some detailed articles on the topic. One by A. Z. Broder and A. C. Ciccolo is highly recommended, if you would like to get a grasp of the potential of this approach to indexing unstructured information – Towards the next generation of enterprise search technology. It describes some of the differences between enterprise search and web search, and provides summaries of the other articles in the issue. I found this snippet interesting:
The field of UIM may come full circle: while the unstructured search paradigm on the Web exploded in the consumer sphere before being adopted in the enterprise, we believe that the combination of semantic and linguistic annotations with unstructured search will follow the more conventional path of first being developed in the enterprise sphere before becoming pervasive in the Web world.
I gave a presentation on duplicate content at Pubcon yesterday. The panel I was on was well received, and Barry Schwartz covered the session at Search Engine Round Table: Duplicate Content Issues (Yahoo & Google). Joe Duck also has some thoughts about the session: Pubcon Las Vegas – Duplicate Content Session with Google and Yahoo. I’ve had a chance to meet some new folks, and say hello to some old friends. The Conference has been a real pleasure so far.
A new patent application from Microsoft describes a process that sounds very much like how Google Suggest works.
Pubcon starts this morning in Las Vegas, and I have to race off to register shortly.
I arrived yesterday, and toured around the City a little. The Hotel/Casino that I stayed at a few months ago, the Stardust is now closed, and has a big fence around it, with signs that an auction will be held later this week for the public.
Noticed these patents had been granted to Google this morning:
Methods and apparatus for providing search results in response to an ambiguous search query
Inventors: Benjamin Thomas Smith, Sergey Brin, Sanjay Ghemawat, John Abraham Bauer
Assignee: Google, Inc.
United States Patent 7,136,854
Granted November 14, 2006
Filed: December 26, 2000