The March, 2006, issue of Scientific American has a good summary of the start of the art in machine translation (MT), including an introduction to the latest phase: statistical machine translation.
The promise of machine translation has languished since the 1950s, when IBM and Georgetown University collaborated on a system to translate Russian into English. Despite repeated predictions of true MT being just around the corner, we remain wholly dependent on human translators for high-quality translations. Current MT can provide quick and dirty translations, gisting of lengthy documents, and even handle simple material with a well-defined grammar and vocabulary, but that's the limit.
The question is: will statistical machine translation do any better? The approach here is to use the vast amounts of parallel texts now available on the Web as a resource for statistical processing to find likely matches, and use Bayesian probability analysis when an exact match cannot be found. Google, of course, is deeply involved in this, as is Microsoft and the other major players in the IT industry. MT has been a potential killer app for more than a decade, though no product has reached the market so far.
The statistical approach makes several risky assumptions. First, the parallel texts found on the Web are actually good translations, something that without human review will remain unknown, though the greater the number of texts, the more likely this issue will be minimized. Second, the text to be translated contains no new words, phrases, idioms, slang, or terminology, all of which will be subject to the vagaries of statistical analysis, and likely produce some amusing and some misleading results. Third, the statistical approach ignores, as do other approaches, input from linguists and translators. IBM's Deep Blue was carefully and thoroughly trained by chess experts, which in turn led to its ability to defeat Garry Kasparov in 1997. A similar approach in the development of MT is lacking.
Statistical machine translation has its champions, particularly Kevin Knight, founder of Language Weaver, the only statistical MT company around now. It also has its detractors: Systran, a traditional MT company, as well as the ATA, which believes that MT is all hype with no hope, and Keith Devlin, head of Stanford University's Center for the Study of Language and Information, who believes that human-level translation is not achievable by machines.
So what does the future hold? First, machine translation and machine-assisted translation software that makes translators' work go more smoothly and quickly. Second, incremental improvements in translation memories, online glossaries, parallel text processing (statistical or otherwise), and rules used in MT software. Finally, if we ever do create a sentient, artificial intelligence, that may be able to translate at or even above a human's level. But that future is far off.