Semantic Relations from Query Logs

One way to help in that process of organizing the Web is to use what people do in the Web.

– Ricardo Baeza-Yates, from a presentation on Extracting Semantic Relations from Query Logs

How related might different search queries be when they share a number of pages in search results, and searchers tend to click upon those shared results more than other results?

If you go to Yahoo’s search and perform a search for the term [wcca], the first result that you see in the search results is a page titled “Wisconsin Circuit Court Access.” If you search for [wisconsin circuit court], you’ll see the same page at the top of the search results. If many people searching for each of those terms tend to mostly click on the link for that page, and no other pages, it’s possible that Yahoo might start considering those query terms to be very closely related.

Because of such semantic relations, the search engine might start offering searchers a query suggestion for a related term at the top of the search results for an original query.

Continue reading “Semantic Relations from Query Logs”

Search Engines and Polysemous Words

Some words that you might search for at a search engine may have more than one meaning and are known as polysemous words. For example, the word fencing can mean a sport involving swords, a man-made barrier enclosing an area, or activity to make a profit from illegally gained goods. Words or phrases that can have two or more are sometimes referred to as polysemous words.

Polysemous words can pose challenges for:

  • Search engines – trying to identify the intent behind searches.
  • Searchers – seeing results unrelated to what they were trying to find.
  • Site owners – finding their pages in search results surrounded by sites offering something very different from what they offer
  • Advertisers – who may bid on certain words or phrases as sponsored results for searchers who may have absolutely no interest in those ads

If someone enters the word [fencing] into a search engine, the search results they see will likely be filled with pages related to all of the different meanings of the word such as, electric fences, local search maps for fencing companies, Olympic moments relived at the United States Fencing Association web site, the Wikipedia entry on Fence (criminal), and others.

Continue reading “Search Engines and Polysemous Words”

How Search Engines Might Identify and Handle Soft 404 pages and Login-Required Pages

When people in the Mideastern United States don’t hear something that someone says, they may say “excuse me,” to ask the person whom they are having a conversation with to repeat what they just said. If you’re having a conversation in the Southern United States and you say “excuse me” to get someone to repeat themselves, it might evoke a blank stare (I’ve seen it).

Non-verbal communication that doesn’t seem to match the message sent with words might also cause confusion and misunderstanding (been there, too).

Many websites are set up incorrectly, in a way that when a visitor or a search engine crawling program attempts to reach a URL that doesn’t exist on the site and is redirected from that inaccessible URL to a dedicated error page showing the visitor a 404 (not found) or 403 (forbidden) or 5xx (server error) message on their screen, the message in the header from the site’s server may be a “200” ok message, which indicates that there isn’t a problem – even though there is. Some pages are only inaccessible temporarily, like when a database may be down. When a server error shows for those, the message that is sent from the server shouldn’t be a 200 (ok) message either.

Sometimes visitors are redirected from inaccessible URLs to a site’s main homepage as well.

Continue reading “How Search Engines Might Identify and Handle Soft 404 pages and Login-Required Pages”

How Google May Rate Raters

In my last post, I wrote about how Google may be incorporating Sentiment Analysis into the snippets that they showed for some search results. Another new feature that was announced at Google’s Searchology was the display of user ratings for products on some pages. We were told that these reviews can be found in “rich snippets” which show up under the title to a page in a search result, and above the snippet, or description for a page.

A recent patent application from Google explores the topic of ratings, assigning quality scores to raters, and discounting or eliminating ratings for dishonest or malicious raters. It made sense to look a little more closely at the ratings that now appear in “rich snippets” and spend some time with the patent filing to see if it might impact how ratings might be shown in the future.

In a search for [new york seafood restaurants], I found one result from Yelp that showed an overall ranking, number of reviews, and an indication of how expensive the restaurant listed might be:

Continue reading “How Google May Rate Raters”

Google’s New Review Search Option and Sentiment Analysis

Sentiment- a general feeling, opinion, personal judgment, feeling, or sense about something.

At Google’s recent Searchology presentation, one of the new features described as being used by Google was sentiment analysis.

In the recap of the event from Google’s Matt Cutts, he tells us that:

If you sort by reviews, Google will perform sentiment analysis and highlight interesting comments.

I’ve seen a number of papers from Google on sentiment analysis, and a recent patent filing, so I decided to look closer at some of those review search results.

Continue reading “Google’s New Review Search Option and Sentiment Analysis”