
In the past, AI-generated videos of people have tended to have some stiffness, glitchiness, or other unnatural elements that make them pretty easy to differentiate from reality. But now, new technology and in-depth data-gathering processes are generating shockingly realistic deepfake videos. MIT Technology Review’s AI reporter, Melissa Heikkilä, saw firsthand just how believable some deepfakes have become.
She allowed an AI startup, Synthesia, to create deepfake videos of her. The final products were so good that even she thought it was really her at first. In this edition of What’s Next in Tech, learn how Synthesia gathered the data necessary to create these videos, and what they suggest about a future in which it’s more and more challenging to figure out what’s real and what’s fake.
What is technology’s role in constructing the future? In the latest issue of MIT Technology Review, we explore how societal, commercial, and cultural factors determine what gets built, how it’s used, and who benefits. To read the full issue and gain access to expert insights and big picture perspectives on the technology topics that matter most to you, subscribe today.
Synthesia’s new technology is impressive but raises big questions about a world where we increasingly can’t tell what’s real.
I’m stressed and running late, because what do you wear for the rest of eternity?
This makes it sound like I’m dying, but it’s the opposite. I am, in a way, about to live forever, thanks to the AI video startup Synthesia. For the past several years, the company has produced AI-generated avatars, but it has now launched a new generation, its first to take advantage of the latest advancements in generative AI, and they are more realistic and expressive than anything I’ve ever seen. While the release means almost anyone will now be able to make a digital double, on this early April afternoon, before the technology goes public, they’ve agreed to make one of me.
When I finally arrive at the company’s stylish studio in East London, I am greeted by Tosin Oshinyemi, the company’s production lead. He is going to guide and direct me through the data collection process—and by “data collection,” I mean the capture of my facial features, mannerisms, and more—much like he normally does for actors and Synthesia’s customers.
In AI research, there is a saying: Garbage in, garbage out. If the data that went into training an AI model is trash, that will be reflected in the outputs of the model. The more data points the AI model has captured of my facial movements, microexpressions, head tilts, blinks, shrugs, and hand waves, the more realistic the avatar will be.
In the studio, I’m trying really hard not to be garbage.
I am standing in front of a green screen and Oshinyemi guides me through the initial calibration process, where I have to move my head and then eyes in a circular motion. Apparently, this will allow the system to understand my natural colors and facial features. I am then asked to say the sentence “All the boys ate a fish,” which will capture all the mouth movements needed to form vowels and consonants. We also film footage of me “idling” in silence.
He then asks me to read a script for a fictitious YouTuber in different tones, directing me on the spectrum of emotions I should convey. First I’m supposed to read it in a neutral, informative way, then in an encouraging way, an annoyed and complain-y way, and finally an excited, convincing way.
We film several takes featuring different variations of the script. In some versions I’m allowed to move my hands around. In others, Oshinyemi asks me to hold a metal pin between my fingers as I do. This is to test the “edges” of the technology’s capabilities when it comes to communicating with hands, Oshinyemi says.
Between takes, the makeup artist comes in and does some touch-ups to make sure I look the same in every shot. I can feel myself blushing because of the lights in the studio, but also because of the acting. After the team has collected all the shots it needs to capture my facial expressions, I go downstairs to read more text aloud for voice samples.
This process is very different from the way many AI avatars, deepfakes, or synthetic media—whatever you want to call them—are created. Read the story to learn more about the process and watch the final deepfake videos.
Article link: https://www.linkedin.com/pulse/ai-startup-made-hyperrealistic-deepfake-thats-so-kocke