IBM tackles multilingual web searching

I’ve been enjoying visiting a number of sites that are written in languages other than English, such as Google.Dirson.com and Référencement, Design et Cie, and others. I often rely on some of the translation services available online to read those sites, but I have trouble when searching the web in finding some information that isn’t written in English.

It would be nice to have a way to search non-English sites without having to try to translate queries into other languages first.

IBM has a patent filing, published as a patent application last week, which tries to help people find sites in other languages that are relevant to their searches, and might be authority sites on those subjects.

One of the fastest growing groups of users on the web don’t speak English, and while they may be searching for information in their native languages, they may also want to see results consisting of documents in other languages. The method described in this patent application involves helping us overcome that language barrier without having to resort to a translation service to form queries.

Searching hypertext based multilingual web information

Inventors: Ling Zhang
Assignee Name and Adress: International Business Machines Corporation
US Patent Application 20060059132
Published March 16, 2006
Filed: July 29, 2005

Abstract

The present invention provides methods, apparatus and systems for searching hypertext based multilingual Web information when searching on a network for keywords to be queried. A method includes: a receiving step for receiving keywords input by a user; a native language hypertext searching step for searching on the network, according to the keywords to be queried, for all hypertexts whose representing language is the same as a language representing the keywords and which matches the keywords to be queried; extracting hyperlinks related to an arbitrary language from all the searched hypertexts; a hyperlink ranking step for ranking the extracted hyperlinks according to the correlativity of the hyperlinks with the keywords to be queried; and returning to the user ranked search result. Thereby, an accurate cross language searching can be provided without extra machine translation effort, being more accurate and objective than machine translation, even than human translation.

This patent uses an approach involving anchor text and hyperlinks to solve problems with language translation, and help people find authority pages in more than one language based upon a query in his or her own language.

Here’s one example offered by the patent application on how this could work:

supposing a Chinese Internet user tries to locate the homepage of “Reader’s Digest” magazine, he/she will input “(Reader’s Digest)” (keyword) expressed in Chinese, since many Chinese Web pages include hyperlinks to the Web site of the magazine of “Reader’s Digest” and most of the hypertexts corresponding to the hyperlinks include “Reader’s Digest” expressed in Chinese ( (Reader’s Digest)), by matching the hypertexts with the keyword and analyzing the hyperlink distribution, the URL www (followed by) rd.com of the magazine of “Reader’s Digest” can be retrieved.

This seems fairly simple, and the process of how this could be implemented is spelled out in much more detail in the document. Amongst other implications it may hold, it describes a good reason not to use “click here” as anchor text on your site, and to choose your anchor text carefully.

Share

4 thoughts on “IBM tackles multilingual web searching”

  1. Excusez le mot, but it’s obvious that native English speaking people have very little understanding of non-English search. For example, Google is brilliant in non-English search. Actually, so are all relevant search engines (MSN, Yahoo! and ASK).

    Although I have to admit that I still feel (for example) Google relies still to heavy on hyperlink-text-recignition and to little on actual text-recognition (of the indexed document) their search is close to perfect. Even though IBM may have been granted the patent it feels like other SE’s have beaten IBM with the actual implementation of localized search.

  2. Hi Ulco

    You raise some very good points.

    This is still only a patent application at this point, rather than a granted patent, so it is open for challenge from Google, Yahoo!, Fast, or anyone else who might want to contest it.

    What’s interesting about this patent application isn’t whether or not other search engines are doing something similar. We should probably expect them to.

    But, we can look at this document and get an idea of one way that they could be doing something like this. I think it defines their process nicely.

  3. Hi, thanks for quoting my blog ;)

    Well, searching by keywords is very language related. You may try to search by “analogy” with social services like Yoono.com : instead of using keywords, you specify an url and the tool look after analog ones base on people’s bookmarks. It is less language relevant.

  4. Hi Sebastien,

    Nice find on Yoono.com. I just tried it out, and I like the idea. Found a couple of interesting new sites with it. I wonder if more language independent services like this one will thrive. I hope so.

Comments are closed.