Machine translation (MT) remains a Holy Grail of computer science, and possibly the killer app of the 21st century. That said, the MT systems as of 2007 are woefully inadequate for any real-world uses beyond basic comprehension of simple texts and gisting.
Their failure comes as a result of a variety of issues that occur in language. In an attempt to clarify these issues, I have been analyzing and classifying syntactic patterns that current systems, statistical, frame-based, or traditional, simply do not handle well.
The second class is a sentence with external context. As an example:
For those of you who do not read Japanese, the sentence translates into English (by me) as "the spinner(s) and air intakes are painted in the squadron color of blue".
The problem here is the word “spinner” (which refers to the cap that goes over the front of a propeller). Should the English be singular or plural? Japanese does not inflect nouns for number, so a noun in Japanese is in an indeterminate state unless a specific number modifies it. The reader supplies the singular or plural as appropriate based on context.
When translating from Japanese to English, the translator must also identify this context, and then make the appropriate choice in English. In some instances, the choice of singular versus plural may be trivial, in others it may be important.
In the example here, an MT system would have to guess, but to me it was obvious because the aircraft being described is a P-38, which has two propeller engines, and therefore two spinners. The book this sentence came from included a photo of a P-38 next to this sentence, so I did not have to rely on my knowledge of WWII aircraft. I simply looked at the picture, saw two spinners, and so used the plural.
No MT system currently available can make this simple contextual distinction. While there is no theoretical reason a future MT systems will not be able to read into deep context and figure out whether to use the singular or plural, the actual programming of such a system is, by today's standards, hardly trivial.
Although the example here and this class of sentences may seem trivial, it is not. Just start looking around at all the singulars and plurals we have in English, then think about all the languages (Chinese, Japanese, Korean, to name but three) that don't make a distinction in number for nouns. The MT systems somehow have to figure this out, and there are lots of cases like the one above in which they won't.
Of course, an error like the one in my example would be trivial except to WWII enthusiasts. But if the machine being described were used to treat cancer patients, then deaths might result from a mistranslation, as tragically happened due to a translation error (whether human or machine remains to be seen).
Number is important, and must be accurately reproduced by human or machine. Currently and for the foreseeable future, humans are far better at this task because of their inherent ability to perceive context, even deep context, and come up with the right answer.