Imagine that you run a search engine, and you find a way to predict the outcomes of certain events fairly closely based upon internet activity such as browsing and search histories, page clicks in search results, actions taken on social networking applications, and so on. The events might involve things such as winners of American Idol, political election outcomes, weekend movie revenues, or music album sales, attendance for sporting events, or television ratings for different shows.
What would you do with that power?
A Yahoo patent application granted today explores how the search engine might use data about how people act on the Web to predict that kind of information.
Continue reading “Yahoo on How Internet Activity Can Predict Event Outcomes”
Your website may be invaded by robots at any time. If you’re lucky that is – at least if you want people to visit you from places like Google or Yahoo or Bing. And, if the visiting robots are polite.
In the early days of the Web, automated programs known as robots, or bots, were created to find information on the Web, and to create indexes of that information. They would do this regardless of whether you wanted them to visit your pages or not, and you had no way to tell them not to go through your web site.
If you search through Usenet message boards from the early days of the Web, you might come across a document such as the World Wide Web Frequently Asked Questions (FAQ), Part 1/2 (December, 1994), which describes robots in those days:
4.10: Hey, I know, I’ll write a WWW-exploring robot! Why not?
Continue reading “Google Patent Granted on Polite Web Crawling”
In February of 2010, Google purchased a social Q&A site, Aardvark. It seems like a great match, for a couple of reasons. One is that a paper from Aardvark that attracted a lot of attention, The Anatomy of a Large-Scale Social Search Engine (pdf), written by Damon Horowitz and former Googler Sepandar D. Kamvar, was admitted by its authors to be inspired by one of the early Google papers, The Anatomy of a Large-Scale Hypertextual Web Search Engine. Another is that Aardvark’s founders and senior team members include a number of former Google (and Yahoo) employees.
Instead of looking for web pages that might answer your questions, Aardvark enables you to ask questions of people in your expanded social network (and beyond), and to identify topics that you might be interested in answering. While there are a number of Question and Answer type sites on the Web, such as Yahoo Answers, those don’t send out questions quickly to people who might be able to provide an answer, but rather rely upon people maybe happening upon your question.
A Yahoo patent application published this week explores a “communal search” system where someone might get real time responses to questions from people who might know the answers. People chosen to respond to questions might be selected based upon their location, activities they participate in, or some relationship to a location or time and the query. This system may also attempt to automatically answer queries based on previous questions and answers from others who have used the system.
Continue reading “Yahoo’s Social Search Answer to Google’s Aardvark?”
Google’s recent purchase of Metaweb, who run the Freebase directory left many wondering at the motivations behind the acquisition. Did Google buy the company for its technology, for its Freebase directory, for the expertise of its employees?
A Google patent application published today hints at one reason behind the deal, with a mention of Metaweb’s Freebase, and how it could be used by Google in a process that may expand the amount of information that the search giant shows us about specific people, places, and things (including ideas and concepts such as democracy) in search results.
It might also result in search results that are mashups of different information relating to queries involving named entities, such as seen in the image below:
Continue reading “Google and Metaweb: Named Entities and Mashup Search Results?”
When you enter a set of keywords into Google, the search engine attempts to find all the pages that it can which contain those keywords, and return a set of results ordered based upon a combination of relevance and importance scores. But it’s possible that many of the pages that could possibly be returned in response to such a search may not be very good matches for a topic related to the query terms used, or may be spam pages.
According to a Google patent filed in 2006 and granted today, around 90 percent of web pages that could be returned for topics such as computer games, movies, and music are spam pages, which exist only to “misdirect traffic from search engines.” The patent tells us that those pages are usually unrelated to those “topics of interest” and try to get a visitor to purchase things such as pornography, software, or financial services.
The patent presents an automated process that might be used by the search engine to classify documents based in part upon user-behavior data, to help weed out web spam.
Continue reading “How Google Might Fight Web Spam Based upon Classifications and Click Data”