Automatic translation of languages by machines has been a standard
fixture of science fiction - and like quite a few other sci-fi
standards, it has been painfully slow to cross over to reality.
A recent breakthrough called 'statistical machine translation' promises
to pluck this fantasy out of its permanent future resident status.
Here's a description of how the technique works (from an
Economist article in June 2006):
"Statistical translation encompasses a range of techniques,
but what they all have in common is the use of statistical analysis,
rather than rigid rules, to convert text from one language into
another. Most systems start with a large bilingual corpus of text. By
analysing the frequency with which clusters of words appear in close
proximity in the two languages, it is possible to work out which words
correspond to each other in the two languages. This approach offers
much greater flexibility than rule-based systems, since it translates
languages based on how they are actually used, rather than relying on
rigid grammatical rules which may not always be observed, and often
have exceptions."
Not surpisingly, the company which is at
the forefront of statistical machine translation is Google. Whenever you
use Google Translate, this is what's happening behind the scenes (via
Economist Feb 2010):
"For translation, the company was able to draw on its other
services. Its search system had copies of European Commission
documents, which are translated into around 20 languages. Its
book-scanning project has thousands of titles that have been translated
into many languages. All these translations are very good, done by
experts to exacting standards. So instead of trying to teach its
computers the rules of a language, Google turned them loose on the
texts to make statistical inferences. Google Translate now covers more
than 50 languages, according to Franz Och, one of the company’s
engineers. The system identifies which word or phrase in one language
is the most likely equivalent in a second language. If direct
translations are not available (say, Hindi to Catalan), then English is
used as a bridge."
But currently there are a few drawbacks
with statistical machine translation - which have primarily got to do
with the kind of readymade translated texts they rely on. From yet
another recent
Economist article:
"It is getting better, but it still struggles with
colloquialisms and idioms. As Ethan Zuckerman, co-founder of Global
Voices and a researcher at Harvard University, puts it: “If you sound
like an EU parliamentarian, we can translate you quite well.”
What's
foxing these gargantuan statistical crunching machines is the
linguistic equivalent of the last mile problem. The everyday spoken
language which is rarely captured and archived - let alone translated
into a dozen languages before archiving.
For advertising and advertisers who believe in their existence serving a
larger purpose and providing a public good, there might be an
opportunity here.
Ads and commercials are routinely translated into different langauges -
especially when they come from global multinationals. Because they aim
to communicate with end consumers, these also contain the kind of
colloquilaisms and idioms that EU and UN speeches lack.
What if advertisers could provide Google or a non-profit third party
transcripts of these ads and commercials along with the translations
that have been professsionally created through human experts. If a
sufficiently large number of advertisers commit their future and past
archives, it may end creating a formidable archive of spoken and eveyday
lingo for the statistical inference bots to bite their silicon teeth
into.
The drawback - as some skeptics will point out - is that the language of
advertising may not be any less stitled and far removed than a EU
speech. On the other hand, for advertising that seeks to leave behind a
cultural impact, this could provide a platform to really make it come
true.
If such a thing can be worked upon, the advertsing itself may have a
limited run - but it's value could live on forever by providing us with
better machine translation for ages to come.
[Original pic by Otie]
Recent Comments