Interested in finding a blog about a particular topic or place, rather than individual blog posts? A recent change with Google’s blog search is intended to make it easier to do so. The timing of the change interestingly corresponds to the granting of a Google patent on how Google may index and retrieve blogs last month. The announcement of the change was noted by Google on one of their blogs:
Recently, our blog search team made it much easier to find full blogs about your query, rather than single posts on the topic. This is especially useful if you’re looking for bloggers that post on an ongoing basis about the subject of your query.
— This week in search 8/27/10, The Official Google Blog.
The Google patent describes how the search engine might collect information from a blog’s feed and from the blog itself to attempt to understand what a blog might be about. It might collect information such as the content of posts, post titles, post authors, blog authors, the blog’s title, profile information about the authors, blog roll information, and possibly other information about the blog.
As much as I love exploring search engines, and how they tick, sometimes its good to get away from behind the monitor, and go exploring outdoors.
I’ve been writing recently about topics such as how search engines might mine data found on the Web, and in their own log files to learn more about the intent behind searchers queries, but I learned a little about a different kind of mining this past weekend with a trip to a local Gold Mining Camp Museum.
The earliest history of gold mining in Virginia dates back to 1804, and miners dug ore out of Virginia’s mines until World War II, though many speculators moved out West during the California Gold Rush. In the early 1800’s Virginia and surrounding southern states were the major gold producing region in the United States.
Might Google rank links to pages differently based a perception of how related or affiliated those pages might be to each other? For instance, if three pages authored by the same person link to a fourth page, and two other pages, each written by other people, also link to that fourth page, should the three links from the same author count as passing along three times as much link weight as the links from the independently written pages?
A patent granted to Google today shows how the search engine might analyze how “affiliated” pages or sites are to each other, and how their degree of affiliation might influence the amount of weight passed along by each link.
A new location-based service from Facebook is rolling out this week, known as Facebook Places. The announcement on the Facebook blog, Who, What, When, and Now…Where, describes Places as a way of letting your friends know where you’re at and what you’re doing in realtime when you check in.
The Facebook blog post title caught my attention because of a patent granted earlier this month to Yahoo which collects “Who, What, When, and Where” information about people and the devices they use to connect directly or indirectly to the internet, including mobile phones, TV set top boxes, desktop and laptop computers, fax machines, radio frequency ID (RFID) tags, sensors, and other kinds of devices.
Real World Entities and the W4 COMN
Imagine that Yahoo started paying attention to information on the Web that isn’t normally crawled and indexed by search engine spiders, such as emails and TV set top box searches, location and application usage of mobile phones, social network interactions and physical and online locations, and many other kinds of devices and information flows that connect to and use the internet.
Does Google favor big brands when showing search results? That question has been bandied about on the Web for a while, but the answer may be more complicated than just a matter of brands.
The question arose this morning on Malcolm Coles’ blog, in his post Google treating brand names in search terms as site: searches? after Malcolm very astutely discovered certain sets of search results showing more that 2 results from the same domain.
Rather than just looking for brands, it’s more likely that Google is trying to understand when a query includes an entity – a specific person, place, or thing, and if it can identify an entity, that identification can influence the search results that you see.
I’ve written about the topic before, when Google was granted a patent named Query rewriting with entity detection back in May of 2009, which I covered in Boosting Brands, Businesses, and Other Entities: How a Search Engine Might Assume a Query Implies a Site Search.
When someone performs a search at a search engine they tend to use only a handful or less words to try to find information about a topic. That presents a search engine with the challenge of trying to find web pages and other results in response and attempting to understand the intent behind that search.
If someone enters “new york pizza sunnyvale” (without the quotation marks) into a search box at Google or Yahoo or Bing, it’s not quite clear whether they are looking for: (1) pizza in New York, in a neighborhood or area referred to as Sunnyvale, (2) New York style pizza in a place called Sunnyvale, (3) a place called “New York Pizza,” in Sunnyvale, or (4) some other result.
One approach that could be followed to try to understand the intent behind a query like this is to break down the words in the query into entity types, and apply labels to those entities. With the “new york pizza sunnyvale” example, that could be done a few ways:
[new york pizza]/food [sunnyvale]/location
[new york pizza]/business [sunnyvale]/location
[new york]/location [pizza]/food [sunnyvale]/location
Imagine that you run a search engine, and you find a way to predict the outcomes of certain events fairly closely based upon internet activity such as browsing and search histories, page clicks in search results, actions taken on social networking applications, and so on. The events might involve things such as winners of American Idol, political election outcomes, weekend movie revenues, or music album sales, attendance for sporting events, or television ratings for different shows.
What would you do with that power?
A Yahoo patent application granted today explores how the search engine might use data about how people act on the Web to predict that kind of information.
Your website may be invaded by robots at any time. If you’re lucky that is – at least if you want people to visit you from places like Google or Yahoo or Bing. And, if the visiting robots are polite.
In the early days of the Web, automated programs known as robots, or bots, were created to find information on the Web, and to create indexes of that information. They would do this regardless of whether you wanted them to visit your pages or not, and you had no way to tell them not to go through your web site.
If you search through Usenet message boards from the early days of the Web, you might come across a document such as the World Wide Web Frequently Asked Questions (FAQ), Part 1/2 (December, 1994), which describes robots in those days:
4.10: Hey, I know, I’ll write a WWW-exploring robot! Why not?