English has prepositions. Our prepositions evolved from Germanic and Latinate origins, and so have several functions, that is to say meanings and uses. Prepositions play an important role in determining the denotative and connotative meaning of an utterance, and often their inherent multiple meanings create ambiguity.
Such ambiguity is often not even apparent to people because they readily see the context of the utterance and unconsciously select the correct, intended meaning. MT systems cannot do this, and so the results are often wrong. Consider this example…
John was buried by his wife, who died 10 years ago.
I saw this sentence in a newsgroup online. Clearly, the wife did not actually bury the husband John, unless she has joined the ranks of zombies, the living dead, or what have you. I'm certain that here "by" means "next to" or "adjacent to", as in adjacent plots in a cemetery, and not "by means of" or "through the agency of". In other words, "by" is used here as a part of a passive voice construction, and not with an instrumental sense.
But this multiple meaning of a preposition, where “by” can be locational, passive voice, or instrumental, is an obvious challenge for machine translation systems. I stuck this sentence into Google's automatic translation service and got...
|Spanish||A su esposa enterró a Juan, que murió hace 10 años|
|French||John a été enterré par son épouse, qui est morte il y a 10 ans|
|German||Vor John wurde von seiner Frau begraben, die 10 Jahren starb.|
They are all wrong. Back-translating to English, we get…
|Spanish||By his wife buried Jon, who died ten years ago.|
|French||John was buried by (meaning: through the agency of) his wife, who died 10 years ago.|
|German||John was buried by (meaning: through the agency of) his wife, who died 10 years ago.|
|Japanese||John was buried by his wife who was dead 10 years before.|
|Chinese||John by buried at his wife died 10 years before.|
The problem is obvious, at least to a human being. As always, humans automatically seek out and find context, even deep context, without a moment’s conscious thought. We are pattern-matching difference engines when it comes to language comprehension. If we find something ambiguous, we take a step back, ponder the situation, and figure it out. We even chuckle or laugh at some of the more egregious ambiguities.
MT does not ponder, cannot contextualize, and just makes a mess. Unlike human translators, an MT system does not know what it knows or doesn’t know, cannot recognize when an utterance is ambiguous, and thus cannot take a step back from its efforts and consider the broader context, the obvious possible versus impossible interpretations, and make a decision.
This inability will hinder MT for the foreseeable future. There are several solutions being evaluated in laboratories, but which if any will be able to solve this problem remains to be seen.