How a Search Engine Might Rank Videos Based Upon Video Content

Chances are that when you search for a video on Google or at YouTube, the results that you receive are based upon text about the video rather than the content of the video itself. The search algorithm involved might look at the title of the video, as well as a description and tags entered by the person who uploaded the video as well. Annotations on the video may also play a role in determining what terms and phrases the video may be determined to be relevant for as well.

For example, the video below announces Google’s new food recipe search option, and provides a detailed description about the new feature. But none of the text accompanying the video mentions that the person providing details about Google’s added functionality is one of Google’s executive chefs, Scott Giambastianai. If you search for [Google executive chef], you wouldn’t see this video appear in YouTube’s search results and you probably should.

Other factors may play a role in how highly ranked a video could be in search results as well, including things like how many views and comments and likes the video has received, how often it was added to a playlist, and more.

There are some problems with relying upon just the textual content that is associated with a video. One is that a description probably doesn’t do a very good job of describing a long video that might contain a large number of scenes and a variety of content. Another is that on a site containing a lot of videos, the numbers of results received in response to a query may be on the large side.

While the search engine might show a screenshot taken from the first frame, center frame or last frame of the video to help people decide which video might best meet their query, that thumbnail might not be very representative of the actual content of the video itself either.

All of those ignore the actual content of the video itself. What if a search engine could use the actual audio and visual content of the video to decide what search terms it might be relevant for?

It might be easier to get an idea of what the video is about if the search engine created an index from videos that would store keyword association scores between frames of a number of videos and keywords associated with those frames.

Those frames might be associated with keywords based upon what’s contained within images or audio on each video. Google may also choose to use images from those frames as thumbnails show in search results instead of just choosing the first, middle, or last frame of the videos.

A patent application published by Google this past week describes how the search engine might improve the indexing of videos by identifing and indexing both images and audio clips associated with specific keywords in videos. The patent filing is:

Relevance-Based Image Selection
Invented by Gal Chechik and Samy Bengio
Assigned to Google
US Patent Application 20110047163
Published February 24, 2011
Filed: August 24, 2009

Abstract

A system, computer readable storage medium, and computer-implemented method presents video search results responsive to a user keyword query. The video hosting system uses a machine learning process to learn a feature-keyword model associating features of media content from a labeled training dataset with keywords descriptive of their content.

The system uses the learned model to provide video search results relevant to a keyword query based on features found in the videos. Furthermore, the system determines and presents one or more thumbnail images representative of the video using the learned model.

A number of whitepapers from Google authors also provide some hints at the possible future of video indexing:

The system described in the patent relies upon a video annotation index to help searchers find videos, or parts of videos that may be relevant to their queries.

For example, a video that contains a clip or image of a dolphin swimming in the ocean might have that part of the video labeled with keywords such as “dolphin,” “swimming,” “ocean,” and so on.

There are a number of methods mentioned in the patent that might be used to help rank a part of a video for a particular query.

Click-through data may help to determine whether a keyword is appropriate for a particular video. If the same thumbnail image from a video gets chosen on a search for a particular query by a number of searchers, that may indicate a positive association between the query terms and the video.

A similarity search between images and audio from a video, and a labeled training dataset, which contains stock images and audio clips which have meta data associated with them can help to identify the unlabeled images and audio from the video. An example of Google using similarity searches can be found in the Google Similar Images search.

The patent filing and the whitepapers provide a much deeper look at the technology behind the similarity searches that could be used to associate images and audio from videos with labels that could be used to match up with keywords.

It’s possible that the metadata associated with a video, such as title and description may continue to be used by the search engine, but additional data from the content of a video itself can improve the results of a video search considerably.

And it might make it easier to find Google’s executive chef on YouTube when he’s featured in a Google Video.

Share

49 thoughts on “How a Search Engine Might Rank Videos Based Upon Video Content”

  1. Hey Bill,

    I agree, I think other factors play a role in how highly ranked a video could be in search results. I knew things like the title and description are used as a determining factor for a search engine, but I wasn’t too sure if more data from the content of the video itself could improve results of a video search. Thanks for sharing what you’ve learned.

  2. Makes sense that Google would be trying to interpret the actual content of the video rather than relying on traditional factors.

  3. This is good at least we are aware of these things so that we would at least have some inkling also of how to rank our own videos too.

  4. I’m kinda surprised…I thought we were past this level of technology already. I could have sworn that I read somewhere that Google was using (or close to using) audio voice recognition as a means to determine the keywords in videos as a mean to rank them with respect to relevant search terms. According to this post, and correct me if I am wrong, Google is still relying upon user generated association to that end?

    Maybe I imagined it.

    mark

  5. I’m not too sure I understood well this one. For me, it kinda makes me think about Google Goggles for Internet video :)

    From my point of view, the power of that technology (altough it must be very intense and expensive computing) is putting relationship frames of those videos. For example, an image of the World Trade Center and a plane in the same video could strongly hint to Google that this video is relevant to the 9/11 terrorist attack.

    It will also probably strongly help reduce video spam.

    Am I close Bill?

  6. Thanks for an informative piece. I’m looking for a novel approach to beat my competition in my field and have just started looking into video optimisation as an option, as I love filming and it works perfectly for my type of work, so these details are very helpful. Thanks again.

  7. Hi John,

    Imagine someone spends the time writing a great title, description, choosing appropriate tags, and then uploads those with the wrong video, and doesn’t doublecheck. If a ranking system didn’t look at the content of a video itself, and relied solely upon the text associated with that video, then there’s the potential for some really bad results.

    There are other issues that could make relying just upon textual data less than idea. One is that a long video might cover more than you really have room to write about in your description. And the same with a video that might cover a few topics.

    Chances are also good that if an indexing system can do a good job of identifying what’s contained with a video, and choosing annotations that fit the images and audio within that video well, that we would see much more relevant results in searches for videos.

  8. Hi Steve,

    I think so, too. I’m not sure how close we are to Google using this method. It seems like they’ve done a fair amount of research in terms of being able to do similarity searches for images, and they’ve developed a way of capturing audio as electronic histogram images to search with. The question is, when do they take those methods and actually apply them to videos. I’m guessing it’s something that we’ll see in the not too distant future.

  9. Hi Andrew,

    I’ve gotten a few ideas of things I would like to try based upon this patent filing and the whitepapers. I don’t think Google is using this process yet, but will possibly be soon.

  10. Hi Mark,

    There have been a few whitepapers on this kind of technology in the past three or four years, and I believe Google has been using histograms (images of wavelengths of audio) to try to identify duplicates of audio, and to find copyrighted audio in YouTube, but I don’t believe that they have started using the technology to index videos yet.

    Google’s similarity search just came out of beta within the past few months, and was added to Google image search. So that feature that could be used to help identify the contents of videos is something that Google has been hard at work upon as well.

  11. Hi Jeremy,

    A lot of the technology that has been, or will be going into Google’s visual search does look like it can be applied to videos as well. You may have seen my post from last month, The Future of Google’s Visual Phone Search?, which describes a number of different types of visual searches that might be used in something like Google Goggles. Google has also described how they might use OCR tecnology in streetviews images to make local search better. I wrote about that in Better Business Location Search using OCR with Street Views.

    At it’s most basic level, this approach would stop a video at different frames, and see if it can identify objects within videos (or use OCR or other visual technologies), and apply annotations to that particular frame. An example they use is of a video that contains images of a dolphin swimming in the ocean. The search engine might select a frame that shows a dolphin swimming near a pier, and apply annotations to that particular image such as: “dolphin,” “swimming,” “ocean,” “pier.”

    The patent doesn’t suggest that it might take the step that you suggest of determining relationships, like you have with your example of the World Trade Center towers and an airplane. But that might just be a next step that they could take.

  12. Hi Richard,

    You’re welcome. I think this patent describes an important step along the line to better indexing of video content, because it starts focusing upon the actual content contained within a video instead of just the text that accompanies it. Problem is, we don’t know when this technology will be implemented, but I’m guessing that there’s a good chance that it will be. At this point, it’s still pretty important to do a good job with your choice of title and description, in letting people know what a video is about.

  13. Pingback: Anonymous
  14. Is this the result of the new improved algorithm of Google? Thank God for doing the public a big favor. It would be a lot easier for us to see relevant results because of this technical improvement. It would also help site owners to consider videos as a substitute for articles whenever they are doing SEOs.

  15. I did not think this would even be possible. I have finally started to decipher how search results are effected by keywords and descriptions and from you previous post, even learned how pages can be broken down into blocks to effect search results. I was completely out of the loop when it comes to being able to potentially manipulate search results through video content.

  16. Wasn’t the whole Google Voice experiment aimed at fine tuning their voice recognition so that they can better analyze the spoken word (in videos)? And they have been fine tuning Optical Character Recognition as well to parse text within the video images. Nothing too ground breaking except that they would rank a particular frame for a keyword, which is quite interesting.

  17. I know several companies that have turned to transcribing the audio of each of their videos in order to give Google/Youtube a greater sense of context for their videos. It’ll probably be a while before they start using this technology to index videos, but it’ll save those folks a lot of time.

  18. I think that will be a huge progress in video search, but I also think it will take a while to achieve good results. The thing is that video has more than just words in it, it also, and maybe most of all, have context – a general idea that you don’t always say it in the video but your viewer understand. no doubt that it will help, along with other parameters, to achieve better video results.

  19. I think Title Tags, content descriptions, quality of subscribers, and tags carry significant ranking factors.

  20. An interesting post, as ever Bill. This blog is my primary must-read SEO blog at the moment – mainly because it focuses on the possibilities of the future.

    Anyway – enough of the flattery – I read a few comments on here about Google’s voice recognition being used to rank and understand videos. Regardless of whether this is being used, this system has a long way to develop until its practical and functional.

    If you use YouTube’s voice recognition function – which must be based on this technology – the results are hilarious, hopelessly wrong, and akin to beat poetry. The more mundane the video the better.

    Here’s a sample from a video about tyres:

    “Bag – beverage online brings this bags bank confidential another race thank you tonight nation”

    Part of the difficulty is interpreting accents (american videos usually fare better), but I doubt the system will be used to actually rank videos until it’s out of Beta.

  21. Hi Ricky,

    This is independent of the new Farmer/Panda update from Google.

    A few people from Google have publically stated that the search engine has been making about 400-500 updates a year over the past few years to their search algorithms. It doesn’t look like this particular process has been added to Google/Youtube yet, but I’m guessing that we will see something along these lines from the search engine.

  22. Hi Trae,

    I think this approach is aimed at making it less possible for people to manipulate search results when it comes to video, by relying less upon the text associated with that video, and more upon the content within the video itself. When something like this is implemented, if you want a video to rank for something it’s going to help to actually include that within the video.

  23. Hi Brent,

    Google’s GOOG411 collected a lot of data about voice and voice recognition before it was discontinued last October. And Google seems to have been doing a lot of research on Optical Character Recognition (OCR) to recognize text within images and videos.

    But, the patent filing I’ve written about doesn’t even mention speech recognition or OCR. Instead, it focuses upon similarity searches for images – see Google Similar Images at http://similar-images.googlelabs.com/

    Google might look at a frame from the video, search for similar images that are labeled, and apply those labels to the images in the video as potential keywords that image can be found for. So, the image aspect of this patent goes beyond OCR.

    As for audio, Google might take the sounds it hears such as music, the sound of a car engine, a dog barking, and other sounds, and transform them into an image of the wavelengths (in the form of a histograph). It might then search through a database of known histographs to find similar sounds, and label those.

    The following papers from Google employees describe some of the possible approaches:

    Waveprint: Efficient waveletbased audio fingerprinting (pdf)
    Audio Fingerprinting: Combining Computer Vision & Data Stream Processing
    Beyond “Near-Duplicates”: Learning Hash Codes for Efficient Similar-Image Retrieval

  24. Hi FinallyFast,

    I think it’s a good idea to transcribe those, and that it can help the search engines understand what’s contained in the videos. It’s not a surprise that Google wants to actually analyze the visual and audio content of those itself – transcripts, titles, and descriptions can be wrong. But, the transcriptions can be a useful feature.

  25. Hi galngal,

    Good points. The processes involved go beyond what might be included in the text that accompanies the videos, and the text that might appear with the images in the videos that might be OCR’ed. If Google can effectively add annotations at different parts of a video based upon what it identifies through a similarity search for images and a waveprint (see the first document in my answer to Brent’s comment), then it can not only help you find a video that is relevant, but it also might lead you directly to the part of the video that matches your query terms.

  26. Hi WDS,

    Those are definitely signals that Google is likely considering when ranking videos. This patent filing points to a number of other signals that they can use as well, and by broadening the range of available signals to consider, may go a long way towards providing even better rankings for those videos.

  27. Hi Twosteps,

    Thanks. Your kind words are much appreciated.

    Voice recognition and transcription are definitely works in progress. I have a phone messaging service that provides me with emails including text transcriptions of my messages. Sometimes those are pretty good, and sometimes they are mystifying.

    I suspect that Google may be further along with images than they are with audio at this point.

  28. Hey Bill,

    I was curious what you thought about the importance of video site maps to help leverage all the videos listed throughout a site. On WordPress, there is a plugin that automatically creates a video site map and updates google about it. But for sites that are not utilizing wordpress, there is not too much information about how the average Joe can create the xml video map manually . (I guess WordPress spoils us by creating all these plugins that do all the work behind the scenes) It’s a great retention business model for us non-programmers as we are unable to duplicate the same functionality on other platform. So screw you wordpress for being so good, ha ha.

  29. Really interesting post. I never really thought about Google being able to take actual content from a video and use that information to rank them. Considering how popular YouTube has become and people’s desire to watch online videos it only makes sense that Google would try to figure out a way to grab information from the videos themselves.

  30. Personally, I doubt this will ever happen. What’s going to stop people from posting a blank video with someone reading an article? There are so many other ways of ranking videos: description, title (already mentioned I know), views, popularity, bounce rates, etc. I doubt Google will actually get so deep as to pick out keywords from a video… Just my opinion at least.

  31. Hey Bill.

    The name of that wordpress plugin I referenced above and use on my blog is “Google XML Sitemap for Videos”. So I am starting to load up my blog with moving related videos on my posts but I have not actually seen any of those videos show up yet in google searches. But I guess I need to be patient and this sitemap is supposed to help google find them. There is really so much good stuff coming out of WordPress these days. The only reason why I did not utilize that platform for my main web site, as well was due this out-dated bias I had that wordpress was only good for blogs. But the newer WP templates allow you to remove away all of the blogginess. (is that a real word, ha ha?) Anyway I recommend all non-programmers utilize wordpress now, regardless of whether it is an ecommerce site or just a blog. (they have really cool plugins for shopping carts as well)

  32. Hi Moving Rates,

    The search engines are using XML sitemaps as an alternative way of finding content (as opposed to crawling pages and finding links on them), and they can be useful. But, they won’t necessarily help a page or a video rank any higher. Still, there’s value in having those videos found by the search engines, and a plugin like the one you’ve linked to is definitely worth using if you display videos on your pages. It’s possible that videos (and web pages) may be found quicker when they are added to a site if an XML sitemap is used. See: Google Study Shows Use of XML Sitemaps Helps Index Fresh Content Quicker

    I know what you mean about WordPress. It wasn’t all that long ago that people would pay thousands of dollars for a custom content management system (and still do), and instead it’s often just as easy for many businesses to use WordPress as a CMS instead. I’ve seen sites using WordPress as a pure CMS (no blogging at all) as far back as 2005.

    Google does give us some guidance on how to manually create a video sitemap, but it’s not easy to follow for someone who really hasn’t done something like it before:

    http://www.google.com/support/webmasters/bin/answer.py?answer=80472

  33. Hi Andrew,

    Imagine if videos on YouTube were indexed by content like described in this patent filing. I think it would be easier to find stuff that you really want to watch, and it would make YouTube that much better a site. There is a lot of incentive there for Google to get this right.

  34. Hi Vlad,

    If the video is blank, then the search engine wouldn’t annotate different frames on it with keywords associated with images within the video. I don’t see that as a problem. It’s possible that Google might never develop and implement this technology, but it looks like they’ve been doing a lot of research on the topic.

  35. I don’t think that this will happen. It’ll take too much time because someone would have to watch the whole video through and moreover they probably won’t be able to create a spider advanced enough to analyse the video content. (sorry about earlier comment – didn’t know you had to include name!)

  36. I agree with Rahul… Is there any valid proof for this? I mean.. surely, no one watches all the videos…and I doubt they can programmatically read the video content.. but anyway, it would be better for them to rank better videos, because now a lot of useless videos rank high..

  37. Hi Rahul,

    The idea behind this process is that it would be automated, and wouldn’t require people to watch through videos and tag them. If you look at the whitepapers that I’ve linked to in the post, they describe some ways that the search engine could tag different parts of videos in an automated manner to analyze the video content. Will we see this happening sometime soon? I’m not sure. But I think the technology to provide these kinds of search is something that Google is developing the capacity to handle.

  38. Hi Zahn,

    I’m writing about a patent and white papers that describe this process, rather than making something up. Read them for yourself, and ask yourself if this is something that we might possibly see someday, perhaps sooner than later. :)

  39. This has made us realise that we should focus more on video content.
    We currently have no content on YouTube, we’re lookign to build a show-reel of our work soon and will definitely take all of this into account for the voice-overs and visuals.

    Thanks for the tips.

    Richard
    Search Specialist

  40. Hi Richard,

    I’m not sure that I would take this patent as a sign that now is a good time to focus more on video content.

    The processes described in the patent filing point towards a time when search results for video will probably become more relevant because the actual content of the video will become another signal in the rankings of results. When that might happen, we can’t be sure at this time.

    YouTube would benefit greatly from a better search functionality like this, so it’s possible that they may do something along these lines soon, but no guarantees.

  41. There really is a need for some kind of content based method for ranking videos, but I don’t think audio recognition is it. Many videos have no dialog and as someone else pointed out (sorry too lazy to scroll up and see who it was) spammers could put up any video with an SEO-nonsense voice over. Imagine the worst keyword stuffed spam site being read aloud, probably in butchered English, over random video footage of monkeys, cute kittens, large breasted women, and men getting hit in the genitals. I am sure it would be amusing… until YouTube is filled with them.
    I guess click through, repeat views, comments, viewer ratings, shares, flagging, etc will have to do for now. I do have faith that eventually Google or someone will eventually be able to analyze content accurately enough to amaze us all.

  42. Hi Nick,

    A nonsense voice-over approach wouldn’t help anyone spam video search. It’s much more likely that someone would spam video search using irrelevant titles and metadata than they would through the process I’ve described, which would be more likely to reduce spam.

    The process in the patent looks at both the images and the audio to index the content of a video. It doesn’t focus solely upon audio content, and it definitely doesn’t try to use speech recognition to understand the content of a video.

    Instead, it attempts to understand what is actually present in the images at different frames within the video through an image similarity algorithm, and applies labels to those frames. With audio, a wavelet algorithm is used to find similar sounds (much, much more than just what someone might be saying, which is what speech recognition would give you). Again, those sounds would be matched up with labels at different frames of the video.

  43. Man, alive! I was just discussing this in a circle not too long ago. Then, I find this!?. Excellent figuring, and spot-on discovery! Tag, tag, tag! Being sure to use the proper keywords.

    Keep it up, Bill!!

  44. Hi MudBug,

    Thanks. Using tags and titles and descriptions well when submitting videos is still a very smart thing to do. The purpose behind this patent is for the search engine to add some tags of its own, based upon what it perceived the actual content of the video to be based upon comparing the images in those videos to similar images that they’ve already identified and labeled. Given that, and the fact that they might potentially be able to use Optical Character Recognition to read text in videos as well, it can make sense to use images within your videos that might be easy for Google’s similar picture algorithm to identify as well as actual text labels in the video as well.

Comments are closed.