Machine Translation at Google

Sharing is caring!

One of the challenges that a search engine like Google faces is that for it to be useful globally, it needs to provide a search for an audience that speaks many different languages.

It’s not surprising then, that the search engine has delved into learning as much as it can about many languages, and even providing a translation service – Google Translate, and a service that allows you to search for translated words and phrases in other languages – Google Language Tools.

Google also offers a Google Translate gadget that you can put on your pages to allow visitors to translate your page into their language.

Google has also worked to make its Machine Translation service mobile by making it available on iPhones.

In addition to an automated translation system, Google looks like they are starting a translation service, the Google Translation Center, which appears to only be for “trusted testers” at this point.

Google Blogoscoped has more details in a post titled Google Translation Center, a New Human Translations Service in the Making, which appears to focus upon human translations rather than automated ones.

A patent application from Google describes one aspect of Google Translate – how users of the service might provide alternative translations when they may not quite agree with the translation provided by Google. An Official Google Blog post, Suggest a better translation informed us of this ability to provide feedback to Google on translations in March of 2007.

The machine translation patent application’s inventors tell us of the need for such translation feedback in the patent filing because of some of the shortcomings of other online translation services:

Such websites usually do not provide a means for users to easily provide feedback on the translation quality, however. If users find a translation to be incorrect or culturally offensive, for example, typically their only resort is to send an email message to the website operator who may or may not route the message to the appropriate person.

Besides, the message might omit the source language version of the text, the translation at issue, or a corrected version of the translation. This makes it exceedingly difficult to analyze translation errors and improve the quality of machine translation based on user feedback.

The patent filing also provides some insight into how the machine translation system and user interface in Google Translate works, and how spam and misspellings might be filtered out of feedback provided to Google Translate.

Machine Translation Feedback
Invented by Jeffrey Chin and Daniel Rosart
US Patent Application 20080195372
Published August 14, 2008
Filed: February 14, 2007

There are many ways that the patent filing tells us that the machine translation service might be improved by users of the service.

One is the ability for users of Google Translate to suggest a better translation, which is available on Google Translate now. How does Google handle those suggested alternative translations? The patent filing provides some details on how it might use automated filters, and then manual review to look at those user-supplied alternatives.

The patent also describes the possibility of allowing users of the service to “rate the quality, correctness or usefulness of the translation using, for instance, a star rating or other scoring mechanism.”

Written comments from users of Google Translate are also a possibility, as well as the ability to flag a translation as being culturally inappropriate.

The machine translation service could learn from previously translated texts, and use a language model based upon probabilities of words and word combinations to offer translations.

The patent filing also provides some details on how it works to translate web pages, and the user interface involved.

Filtering and Reviewing User Feedback For Machine Translation

While user feedback could be really helpful for a service like Google Translate, there’s the possibility that people might abuse the ability to offer feedback for one reason or another. A combination of automated filters and manual review could be used to try to ensure quality translations.

An automated filter might look at user-provided alternate translations, and filter them if:

  1. The alternate translation is empty or unchanged from the original translation
  2. The alternate translation contains obscene language
  3. The user is suspect (the patent filing doesn’t expressly define the use of the word “suspect” for us here)
  4. The user has submitted more than a given number of alternate translations within a given time period
  5. The user has a history of submitting spam
  6. According to the machine translation system, the alternate translation has a low probability of occurrence in the target language, or;
  7. The alternate translation contains redundant words.

Other filters might also be used, such as one that automatically corrects misspellings in the alternate translation.

The user-supplied alternate translations that are not filtered out may then automatically be sent to a trusted individual to evaluate the translation or to a community-based review process which allows people to vote on whether they approve of the alternate translation.

While reading the patent filing, I was wondering where Google might find “trusted individuals” or a community to evaluate translations. The Google Translation Center pool of translators might provide one source of reviewers.

Sharing is caring!

16 thoughts on “Machine Translation at Google”

  1. Very interesting post, Google seems to make use of people feedback (also in Google Images), that’s a great way to improve their products (free).

  2. If facebook use the surfers as translators, google certainly can use them.

    Facebook release a new interface to translate facebook to many other languages. They have created an application called “Translations” that allows users to translate it to any language. This tool enable uses to vote on phrases to help get the most common usage phrases correct.

  3. Hi Idan,

    Thanks for pointing out how Facebook is enabling users to provide translations. I haven’t seen that particular application, but it sounds interesting.

    The Google patent filing describes a way of letting people see alternative translations, so that they can vote on those. I don’t know if they will develop that, but it something that has been considered by the people who wrote the application.

  4. Hi Pablo,

    It’s interesting to see the different ways that Google attempts to collect user feedback, too. I do find the approach used for images pretty interesting.

    The Google Image Labeler, acts as a game to help annotate images. It’s based upon the ESP game at Carnegie Mellon University. It looks like Carnegie Mellon has added some new games, too.

  5. Actually, that’s how forum sites are working. Depending on users, they make benefit out of it. People uploads something, and others downloads those things and leave feedbacks. And those users who uploaded start to get a reputation by collecting those feedbacks. And Forum site owner is the only person who makes the money in this transaction. Smart of Google …

  6. Hi MGA,

    Good point. Trust in an answer based upon feedback, and ratings of that feedback can provide measures of reputation. It’s a way that search engines may also weigh how good an annotation might be when someone leaves a comment or annotation or tag for an image or a video or a web site.

    The danger of that is that sometimes ratings aren’t based upon the value of an annotation or translation, but rather the popularity of a community member, and ratings based upon personalities instead of quality of translation or annotations can provide results that aren’t very good.

  7. Yes, language is king for search engines. Language structure and the correct use of and understanding of languages are the keys to staying competitive and relevant on the search engine scene.

    “Poets, priests and politicians have words to thank for their positions.”
    — Sting, The Police

    The same can be said for the search engine leaders.

  8. Interesting point, People Finder.

    I find myself amazed sometimes about how a search engine might be able to capture the intent and meanings behind some phrases, or at least make the attempt to understand concepts from words that are joined together into phrases that give them completely new meanings.

    Understanding structure and correct usage of languages are important, but it has to be a difficult task.

    Some of the greatest words and the most interesting sentences and sentiments that we have in our language purposefully ignore correct structure and correct usage, and yet people are drawn to songs, to poems, to prose that break those rules.

    One of the challenges that a search engine faces when it comes to translation are idioms and other culturally relevant statements that may have meaning different from a literal look at the words that constitute them. Or that may mean completely different things in different contexts.

  9. Hi bill

    I always wonder if translation in google is possible? cause the resources that has to be a platform of knowledge in language, For example. If they were to do a translation in chinese. How are they to do it? I don’t really feel that it’s posible? Unless they were to do a dictionary to input it. But still there will still be a problem. When people are using it. For example if i as an american. I accidently were to change the broswer to Chinese, I will have lots of problem switching it back to english. Also chinese words are of very very profound languange of the world. It contains formal chinese and simple chinese and profound chinese, but one of the best is still han yu ping ying. But you or user will need to kown langauge like a chinese do before you can used it. So if i were to be using google to search a google site. Even with the translation service you would not be able to find any good used for it?
    I think that it will not work either way. Unless its rather a more professional way to do it base on the users.

  10. Hi Edward,

    The patent application describes something that goes beyond a dictionary in the machine translation process it uses. A language model is used which relies upon previous translations, and human input into those translations, as well as uses of language on the Web, known as a Machine Translation System (MTS).

    Google provides some more information on the system in their FAQ page on Google Translate.

  11. Google is available in lots of indian languages but one major language is missing that is oriya.So I hope google offer its services in oriya language also.

  12. Pingback: SEO Daily Reading - Issue 98 « Internet Marketing Blog
  13. Great information on Google Translate.

    It goes to show how important it is to have this resource, but how difficult it is to maintain it. I’m not sure there is a solution, when meanings inevitably get lost in translation. Perhaps we would just have to accept that there could be more than one result for a translated word or phrase.

  14. Hi Raquel,

    Thanks. It makes a lot of sense for Google to spend a lot of time and resources on Google Translate, and they are using what they learn in some interesting ways. For example, the fact that there might be more than one result for a translated word or phrase is something Google is taking advantage of to learn about synonyms. They’ve described in the past how they can take an English term like “car mechanic” and translate it into French, and then translated it back into English and get a few good translations back, like “car mechanic” and “auto mechanic.” Since both terms share a common suffix (mechanic), there’s a good chance that “car” and “auto” are synonyms.

    So the better Google is at translating from one language to another, the better a search engine they become as well.

  15. HI Josh,

    That’s something that you’ll probably have to ask Google for, rather than me. They have included a lot of languages to their translation services over the past 5 years, and it wouldn’t be surprising if they are in the works to add a good number more.

Comments are closed.