Home Languages Articles Links Downloads About Contact

Languages

bulletEnglish
bulletSpanish
bulletFrench
bulletGerman
bulletLatin
bulletChinese
bulletJapanese

Other Topics

bulletTranslation prices
bulletMachine translation
bulletHarry Potter
bulletTranslation quotes


Free website Translation Service

Search


Advertisement

How to Stump MT (1) | Language Realm





Machine translation (MT) remains a Holy Grail of computer science, and possibly the killer app of the 21st century. That said, the MT systems as of 2007 are woefully inadequate for any real-world uses beyond basic comprehension of simple texts and gisting.

Their failure comes as a result of a variety of issues that occur in language. In an attempt to clarify these issues, I have been analyzing and classifying syntactic patterns that current systems, statistical, frame-based, or traditional, simply do not handle well.

The first class is a sentence with homonyms. As an example:

Rose rose for her rose.

Of course this sentence is a bit silly, but it will demonstrate nicely the problems that homonyms create for MT systems. Taking the above sentence and putting through Google's MT system gives the following:

Spanish Rose se levantó para ella color de rosa.
French Rose s'est levée pour elle rose.
German Rose stieg für sie rose.
Italian Rosa è aumentato per lei di rosa.
Portuguese Rosa levantou-se para ela cor-de-rosa.
Chinese 她玫瑰玫瑰玫瑰.
Japanese ローズはばら色彼女のために立上がった。

All of these translations are wrong, each in similar ways. The name is identified correctly, which is a relatively trivial matter since names can be stored in a lookup table or identified based on being capitalized nouns, a rule-based approach that works well in English.

However, the verb is mistranslated, and the rest of the sentence therefore falls apart. In other words, a grammatically simple sentence with only five words is actually quite complex, and requires some semantic insight, that is to say the the ability to see past the words to find the meaning, to make sense of. Current MT systems simply cannot do this, though there is no fundamental issue here that would prevent future systems from overcoming the problem of homonyms.

This class of sentences may seem trivial, given the choice of example. However, homonyms are quite common in English, and are far more common in languages like Japanese or Arabic, which have relatively few phonemes.

Further, many sentences actually combine names from several languages, and unless the name is properly identified as such, the results can be amusing. The U.N. Secretary-General from Korea, Ban (a relatively common name), has seen several ambiguous newspaper headlines.

MT systems, statistical or otherwise, will have to overcome the problem of parsing homonyms for context in order to produce a meaningful result. Until such time, the output of MT systems will be flawed in an amusing, and at times no doubt important, way.


Back to Machine Translation. top