How a Search Engine Might Rank Videos Based Upon Video Content

Chances are that when you search for a video on Google or at YouTube, the results that you receive are based upon text about the video rather than the content of the video itself. The search algorithm involved might look at the title of the video, as well as a description and tags entered by the person who uploaded the video as well. Annotations on the video may also play a role in determining what terms and phrases the video may be determined to be relevant for as well.

For example, the video below announces Google’s new food recipe search option, and provides a detailed description about the new feature. But none of the text accompanying the video mentions that the person providing details about Google’s added functionality is one of Google’s executive chefs, Scott Giambastianai. If you search for [Google executive chef], you wouldn’t see this video appear in YouTube’s search results and you probably should.

Continue reading

How a Search Engine Might Identify the Functions of Blocks in Web Pages to Improve Search Results

There have been a number of patent filings and whitepapers from the major search engines over the past 5 or 6 years that describe how they might break a page into blocks or segments to understand things like the main topic or topics on a page, which block might be the most important for a page, what to show on smaller screens for mobile devices, and to apply different weights for links depending upon which block they are located within.

I’ve written about a number of those in the past in posts such as:

Continue reading

How Google Might Offer Face Search by Using Pictures from Social Networks

If Google decided to include a facial recognition search as part of the Visual Search described in a Google patent application a couple of weeks ago, a couple of questions need to be addressed by the search engine.

A twitter followers page for George Washington. Happy President's Day.

One is, where would they get the pictures to power that facial recognition software (hint in the image above)? The other is, how would they best avoid privacy concerns?

A patent filing from last week provides some possible answers.

Continue reading

More Ways a Search Engine Might Identify Synonyms to Expand Queries With

A couple of years back, Google was granted a patent on an approach to identifying synonyms by looking at and comparing queries that searchers used to find information. The patent was Determining query term synonyms within query context, and I covered it in my post How Google May Expand Searches Using Synonyms for Words in Queries.

A month or so after that patent was granted and I wrote my post, Google researcher Steven Baker published a blog post at the Official Google Blog titled Helping computers understand language, where he announced that Google would start including synonyms for query terms in search results when the search engine thought that the synonym was a good match for a query term.

A mechanic working on a car (or auto).
Car Mechanic or Auto Mechanic or Both?

Continue reading

The Future of Google’s Visual Phone Search?

Google Goggles lets you search by taking a picture of landmarks, books, business cards, artwork, product labels, logos, and text. It can use Optical Character Recognition to transform text in an image to searchable text on the Web, reads barcodes, finds similar images in databases of artwork and landmarks and other databases. But, we’re only seeing the surface of the capabilities that a phone based visual search can offer with Google Goggles.

A Google patent application published this week shows us what Google’s visual Search for phones might evolve into. When you take a picture of a city street, your picture may include buildings, street signs, people’s faces, cars, and many other objects. If you send that picture as a query, the search engine might break the image into parts and search for many of the objects in the image, and give you a mix of search results based upon all of those parts.

The patent filing is:

Continue reading

Using Images in Blog Posts

I often write large walls of text, rarely adding images to the words that I post to these pages, and I think that’s a mistake. A meaningful image can draw the eye, capture the imagination, and often explain more in a single glance than hours of reading and reflection.

Imagine if Babe Ruth kept a blog during his days of home runs and hotdogs, shattering hitting records and showing a larger than life personality. Babe Ruth was one of the greatest pitchers of his time, and then one of the greatest hitters, and when someone excels at a sport, they’re often referred to as “the Babe Ruth of __________.”

Baseball can be broken down into moments of drama, to individual confrontations, such as a pitcher and batter facing off against each other. The pitcher striving to push or sneak or cajole a ball past the hitter, and the batter attempting to impose his will with bat on ball. Ruth was an incredible talent, and a single look at his eyes can give you a sense of how he intimidated the strikeout artists of his era.

An image of George Herman (Babe) Ruth, Jr., posing with bat in hand and malice towards baseballs in his heart.

Continue reading

Google Patents, Updated

If you took a look at Google’s patent portfolio recently, you might ask yourself, “What kind of company is this?” Is it a search engine or a smart phone company, a memory module manufacturer or a server maker? Does this company really own the rights to a weight loss patent titled, “Method Of Assaying Satiety Enhancing Tastants,” or is that accidentally listed by error from the patent office?

Google acquired a number of patents over the past few years, either by purchase or by license. Those include a good number of phone related patents from Verizon, patents involving video and streaming data from IBM, as well as hardware-related patents from patent holding companies. A few of the IBM patents are the kind you might license if you want to develop self-driving cars. There’s been a lot of discussion about Google’s many acquisitions of the past year, with 40 mentioned in their September 30, 2010 10-Q filing with the SEC, and a few more since then. But, Google’s acquisition of 77 granted patents from Verizon, and another 51 granted patents from IBM happened with absolutely no media attention as far as I can tell.

I’ve listed Google’s granted patents below, by category, and then by the name of the company that made the assignment of the patents to Google.

Continue reading

Document Level Classifiers and Google Spam Identification

There have been a number of news opinion pieces and blog posts appearing on the Web in recent months telling us that Google has become less useful because of web spam from pages scraping content from other site as well as from low quality articles on content farms. Google’s head of Web Spam, Matt Cutts responded to those criticisms by announcing some new efforts at Google to make those kinds of pages not rank as well in search results. From the Official Google Blog, on January 21, 2011:

As we’ve increased both our size and freshness in recent months, we’ve naturally indexed a lot of good content and some spam as well. To respond to that challenge, we recently launched a redesigned document-level classifier that makes it harder for spammy on-page content to rank highly.

The new classifier is better at detecting spam on individual web pages, e.g., repeated spammy words—the sort of phrases you tend to see in junky, automated, self-promoting blog comments.

Continue reading