Lorem Ipsum: Of Good & Evil, Google & China

PCWilliams

Senior Member.
Hi everybody,

It's a rare day when i'm shown something from the conspiratorial world that makes absolutely no sense to me. Usually i can start reading a conspiratorial piece and i'm 3 steps ahead, knowing where they're going and where they're coming from.

Can anybody shed some light on this piece sent to me today?

http://krebsonsecurity.com/2014/08/lorem-ipsum-of-good-evil-google-china/

Is it even a conspiracy?

Thanks. I look forward to the opinions coming from this group.

Paul
 
I think the answer is in the article:

A spokesman for Google said the change was made to fix a bug with the Translate algorithm (aligning ‘lorem ipsum’ Latin boilerplate with unrelated English text) rather than a security vulnerability
Content from External Source
And in the comments:


Harald K
August 18, 2014 at 11:00 am

Krebs… I’ve watched Google Translate for many years. I hung around at their little-known forum (now closed) and answered random questions from people who mostly would never post there again. I even got a nice thank-you mail from a guy at Google Translate once.

And the things I’ve seen there… statistical translation is an ever surprising beast. At one time, “Sarkozy Sarkozy Sarkozy” translated from French to English would become “Bush defeats Blair”. If you translated the Irish national anthem, you would get a bork-up of a literal translation and the British national anthem. (One Irishman was convinced that there was no way “God save the Queen” would appear in there unless it was a deliberate attempt of someone at Google to make fun of the Irish). Sometimes, the translations would be shocking in the bizarre sense they made. At one point, the word “elado” (misspelling of the Spanish word gelado, ice cream) would be translated to “ais krimh” in English!

It’s not surprising at all that the lorem ipsum text would have bizarre translations. If you have an “en” site and a “de” site, but the “de” site only has placeholder text (something that happens very commonly, I bet), that would confuse the hell out of the statistical translator. And in response to confusion, it can get VERY creative.

But there’s no conspiracy here, any more than there was a conspiracy at Google to make fun of the Irish – or the Estonians, or the Israelis, or the Turks, or the Indonesians who all at one time came with such outraged accusations at offensive and weird translations.
Content from External Source

Peter K
August 18, 2014 at 1:56 pm

Ask people who do machine translation and they’ll just laugh at this story. This is just an example of security paranoia seeing patterns and conspiracies and failing to understand how capricious statistical classifiers get in the tail.

Content from External Source

SeanC
August 19, 2014 at 11:33 am

I work in Machine Translation, and the behavior you are seeing is simply the effect of a very VERY undertrained statistical model.

Google Translate is looking for the closest statistical match for the bigram “Lorem Ipsum” (or any of the others you give), and has what are possibly millions of VERY low probability matches for it – and no high probability ones. It’s probably picking some winning sequence with an incredibly low occurrence in the training data that happens to be 10^-15 or so higher than the next nearest neighbor. In other words, you are looking at direct statistical noise.

The regular occurrence of China is probably just due to some bias in the underlying training data. Many modern SMT (statistical Machine Translation) engines are trained on newswire data. Google probably spidered a wide variety of news sources looking for parallel translations, and found a ton of data that had the lorem ipsum placeholder that seemed correlated in some way with real text.

By the way, the “Lorem Ipsum” sequence is frequently known as “greeking”.

Content from External Source
Nothing at all to suggest a conspiracy. Just buggy translation statistical mappings emerging from a semi-automated system. Later corrected by hand.
 
Pareidolia? Kind of funny the narrative he's invented around it. Like seeing messages in the nonsense cut and paste spam emails.
 
This is actually a kind of cool "under the hood" look at this kind of system. Google has a few of these automated "learning" systems, and they work in the same basic way that IBM's Watson (the computer that competed on Jeopardy) does: By reading the internet and making connections between terms. They're making different kind of connections, but from a computer standpoint, it's all the same.

Any computer system is only as good as the information fed into it (garbage in, garbage out), and the internet is not exactly the pinnacle of accurate and consistent information.

The same way Google Translate has collected some weird glitches, Watson also built up some hilarious misconceptions in its time "learning" for Jeopardy. For example, according to its database, Klingons were a proud nation of people from central Asia, while Jedi were a mystical branch of Catholocism primarily based in the UK.

Worked great for a structured general knowledge quiz, but if you were to start feeding it random questions from whatever topic you're discussing over lunch, the results would probably range from bizarre to surreal. Google Translate has the same weirdness in it, and we usually aren't feeding it structured inputs, we're basically throwing our lunchtime conversation at it and occasionally getting back glimpses of the surreal landscape you get when you have a mind (computer or otherwise) process a huge amount of data with no understanding.
 
Last edited:
I've noticed that Google tries to translate things to local equivalents sometimes: like the above "Sarkozy" in French being translated to "Obama" in US English, I've also seen things like "euros" being changed to "pounds" (along with numerical amounts being randomly changed, not according to any real exchange rate).
 
So it basically amounts to a bunch of paranoia over nothing. Thank you everybody for the feedback. I can now put that folder in its appropriate subfolder :)
 
Ah reminds me of the old days when I worked in prepress. Typeface sample books full of paragraphs of lorem ipsum. Greek copy or dummy text to show the faces.
 
Desktop publish programs have lorem ipsum, too.


translate.yandex.com is a good alternative, especially for Cryillic languages. Often interesting differences between that and Google for the same item. For some languages, especially highly inflected or gendered ones, it's useful to do both and read in parallel.
 
Back
Top