The Tynan Files
Posts
Reality is so overrated

Reality is so overrated

When AI can fake practically anything, does 'real' even matter any more?

daniel tynan
May 10, 2024

“Deep fakes.” Source: Midjourney.

In the months since ChatpGPT suddenly appeared in December 2022 and blew our collective minds about what computers are capable of, as well as what that implied for the future of humanity, we've all grown kind of blasé about AI.

Achievements that once seemed miraculous now feel mundane.

Write essays on virtually any topic in the style of virtually any author? Check.

Churn out photo-realistic images of people you'd swear you just saw in the produce section at the supermarket? Check.

Produce meticulously detailed illustrations and artistic images? Double-check.

Compose fully orchestrated pop songs that could go into immediate rotation on your favorite streaming service? No problem.

Generate video clips as eerie and compelling as those produced by some of the great masters of cinema? Give me a minute. OK, done. What else ya got? [1]

Are these reproductions perfect? No. There are still glitches and artifacts in many of the things AI tools produce, and other limitations in terms of length and depth. (Not to mention endless legal disputes over copyright and plagiarism.)

But if you didn't know these things were all extruded from machines, would you immediately suspect they were? Probably not.

And so it goes with the news that Chinese researchers working for Microsoft have taken another step toward obliterating our ability to distinguish actual humans from their virtual doppelgangers.

They audio know better

The VASA-1 project [2] creates "lifelike audio-driven talking faces generated in real time." The researchers figured out how to take a single AI-generated image, add an existing audio track, and create an entire video from it, using AI to provide the mouth movements and facial expressions.

Like these three ladies offering their advice on appropriate tooth care:

The voice samples are from actual humans, taken from the VoxCeleb2 dataset, a collection of 1 million+ utterances from more than 6,000 celebrities, scraped from videos uploaded to YouTube.

(I'm not sure what's more surprising to me; that this training dataset exists, or that it's maintained by AI boffins at the University of Oxford. And what exactly constitutes a "celebrity" in this context? Are there really celebrity dentists? So many questions.)

Here's another video showing different "facial dynamics" you can apply before mashing them up into one coherent piece:

The images in these videos were generated by other AI tools, specifically StyleGAN2 or DALL-E-3, but there's nothing stopping someone from doing the same thing using a photo of an actual person.

Imagine your long-deceased grandmother suddenly coming to life and lecturing you about your poor dental hygiene. Or the Mona Lisa doing a rap song.

Actually, you don't have to imagine that last one — for reasons that are hard to fathom, the VASA people did that for you.

Knowing the kind of blowback they were likely to get, the researchers make a point of saying This Tool Should Not Be Used for Evil and limiting its availability to the general public:

Kids, please don’t try this at home.

On the other hand, they published a detailed scientific paper on how they did it. So the cat (virtual or real, alive or dead) is kind of out of the bag.

Rolling in the deep (fakes)

AI avatars that look and sound like actual humans are a bit of a cottage industry right now. Besides the many sleazy 'AI girlfriend' apps, companies are using these virtual humAIns for online customer support, employee training videos, canned presentations, and any other job that requires someone to say the same damned things while answering the same damned questions over and over. Here's a demo version I created using Synthesia, one of the leaders in this space:

Is that indistinguishable from a real person? Hardly. But for most of the jobs I named above, it really doesn't need to be.

The obvious concern is when this technology is used to create convincing deep fakes of real people. We've already seen a few clumsy examples used in the realms of politics and news, which are only going to get more frequent and harder to detect. But the more common application will be offering scammers yet another technique for separating people from their money.

Like the South Korean woman who was recently suckered out of $50,000 by someone using a deep fake video of Elon Musk. Per a story that originally appeared in The Korea Herald, reported by Yahoo Finance:

"Although I have been a huge fan of Musk after reading his biography, I doubted it at first," Ji-sun told The Korea Herald. "‘Musk' sent me his ID card and a photo of him at work. He also explained that he contacts fans randomly."

The deception deepened with a video call where the fake Musk confessed his love for Ji-sun, further manipulating her emotions.

An unkind person might say anyone who was already a huge admirer of Musk was well on her way to being scammed by somebody. Good thing I'm not unkind.

Giving new meaning to the phrase, "crushing it"

Since I'm in a dystopian frame of mind, I'll end with this video, which is not a deep fake but an actual ad for the iPad Pro that some marketing genius at Apple thought was a good idea. [3]

Steve Jobs and Sonny Bono are both rolling over in their graves right about now.

Are you feeling crushed by... just about everything? Share your pain and sorrow in the comments below or email me: [email protected].

[1] Not to mention generating "NSFW content in age-appropriate contexts," aka, AI Porn.

[2] VASA stands for Visual Affective Skills... something. They never say what. Audio? Application? Asshattery? It's like a Mad Lib for nerds.

[3] Hat tip to Natalie B for bringing this to my attention and also to my nightmares.

Reply

or to participate.