Query Logs and the Slang of the Web

One way to help in that process of organizing the Web is to use what people do in the Web.

- Ricardo Baeza-Yates, from a presentation on Extracting Semantic Relations from Query Logs

How related might different search queries be when they share a number of pages in search results, and searchers tend to click upon those shared results more than other results?

If you go to Yahoo’s search, and perform a search for the term [wcca], the first result that you see in the search results is a page titled “Wisconsin Circuit Court Access.” If you search for [wisconsin circuit court], you’ll see the same page at the top of the search results. If many people searching for each of those terms tend to mostly click on the link for that page, and no other pages, it’s possible that Yahoo might start considering those query terms to be very closely related.

Because of that relationship, the search engine might start offering searchers a query suggestion for a related term at the top of the search results for an original query.

Continue reading

Search Engines and Words with More Than One Meaning

Some words that you might search for at a search engine may have more than one meaning. For example, the word fencing can mean a sport involving swords, a man-made barrier enclosing an area, or activity to make a profit from illegally gained goods. Words or phrases that can have two or more are sometimes referred to as polysemous words.

That can pose challenges for:

  • Search engines – trying to identify the intent behind searches.
  • Searchers – seeing results unrelated to what they were trying to find.
  • Site owners – finding their pages in search results surrounded by sites offering something very different from what they offer
  • Advertisers – who may bid on certain words or phrases as sponsored results for searchers who may have absolutely no interest in those ads

If someone enters the word [fencing] into a search engine, the search results they see will likely be filled with pages related to all of the different meanings of the word such as, electric fences, local search maps for fencing companies, Olympic moments relived at the United States Fencing Association web site, the Wikipedia entry on Fence (criminal), and others.

Chances are that the person searching was only interested finding information about either the sport, the barrier, or the criminal activity.

Continue reading

How Search Engines Might Identify and Handle Soft 404s and Login-Required Pages

When people in the mideastern United States don’t hear something that someone says, they may say “excuse me,” to ask the person whom they are having a conversation with to repeat what they just said. If you’re having a conversation in the Southern United States and you say “excuse me” to get someone to repeat themselves, it might evoke a blank stare (I’ve seen it).

Non-verbal communication that doesn’t seem to match the message sent with words might also cause confusion and misunderstanding (been there, too).

Many websites are set up incorrectly, in a way that when a visitor or a search engine crawling program attempts to reach a URL that doesn’t exist on the site and is redirected from that inaccessible URL to a dedicated error page showing the visitor a 404 (not found) or 403 (forbidden) or 5xx (server error) message on their screen, the message in the header from the site’s server may be a “200″ ok message, which indicates that there isn’t a problem – even though there is. Some pages are only inaccessible temporarily, like when a database may be down. When a server error shows for those, the message that is sent from the server shouldn’t be a 200 (ok) message either.

Sometimes visitors are redirected from inaccessible URLs to a site’s main homepage as well.

Continue reading

How Google May Rate Raters

In my last post, I wrote about how Google may be incorporating Sentiment Analysis into the snippets that they showed for some search results. Another new feature that was announced at Google’s Searchology was the display of user ratings for products on some pages. We were told that these reviews can be found in “rich snippets” which show up under the title to a page in a search result, and above the snippet, or description for a page.

A recent patent application from Google explores the topic of ratings, assigning quality scores to raters, and discounting or eliminating ratings for dishonest or malicious raters. It made sense to look a little more closely at the ratings that now appear in “rich snippets” and spend some time with the patent filing to see if it might impact how ratings might be shown in the future.

In a search for [new york seafood restaurants], I found one result from Yelp that showed an overall ranking, number of reviews, and an indication of how expensive the restaurant listed might be:

Continue reading

Google’s New Review Search Option and Sentiment Analysis

Sentiment- a general feeling, opinion, personal judgment, feeling, or sense about something.

At Google’s recent Searchology presentation, one of the new features described as being used by Google was sentiment analysis.

In the recap of the event from Google’s Matt Cutts, he tells us that:

If you sort by reviews, Google will perform sentiment analysis and highlight interesting comments.

I’ve seen a number of papers from Google on sentiment analysis, and a recent patent filing, so I decided to look closer at some of those review search results.

Continue reading

Search Engine Robots Sharing Cookies?

There’s a little park straddling Delaware and Maryland which has a monument marking the boundary between the states. Etched across the top of the stone marker is a line that indicates the separation between the states, and shows the point where an arc starts, which separates Delaware from Pennsylvania. If you look at a map of the border, you’ll see that the top of the state of Delaware is an arc shape that measures 12 miles from a cupola on top of a courthouse in Historic New Castle, Delaware. The arc between Delaware and Pennsylvania was defined in a deed to William Penn from the Duke of York in 1682. Maryland’s territory was also involved in the setting of borders.

You can hop atop the marker and sit on the state line if you’d like. The monument is surrounded by woods, and you have to travel down a path in the park to reach it.

We take the surveying of such lines, between states, between countries, surrounding towns and cities and counties for granted, as well as the exploration and discovery of the places where we live. The programs that search engines use to discover new pages on the Web and revisit old pages are a little like those explorers and surveyors – finding material online to add to their indexes so that we can explore those indexes and search for information and pages hosted on servers scattered around the globe.

Continue reading

Google on Measuring Impressions of In-Game Advertisements

I remember reading a Stephen King novel a few years back, and getting to a point where one of the characters in the book grabbed a coke to quench his thirst. There was no reason to mention a brand name in the story – it didn’t add to the plot, it didn’t make the story seem more realistic, and it felt like the novelist only included the brand name of the soft drink because he may have been paid to do so. I have no idea whether or not that’s actually the case, but it really lessened my appreciation of the novel.

In the world or universe of a game, someone driving down a freeway might see billboards on the side of the road that contain actual advertisements. Storefronts may carry signs, and recognizable buildings and logos products may appear within games during play. I wrote about some of the possible implementations of games that Google discussed in a patent filing they released on in-game advertising in a post titled Google Games Patent Filing on Targeted Advertisements.

A new patent filing from Google discusses how they might track and measure “impressions” of ads actually placed within a game.

Continue reading