Is AI going to replace real life voice over actors?
That’s a question that’s consistently popped up over the last several years, and a good question to explore in the second part of this blog series. Part one of the series looked at a voiceover talent’s transition into the digital era.
In part two, we’re diving into artificial intelligence (AI), the impact it’s having on the industry, and the idea of synthetic voices replacing real life actors. While we can never really know what the future may bring, I am inclined to say no, AI will not replace real life voice over actors.
There are several reasons I feel this way:
- AI-generated narration can instantly kill all the drama and excitement when used for a full-length documentary-type TV show (as evidenced by one we ran across a few weeks back).
- Certain AI-powered robot vacuums have voices that are annoying enough for people to return the product based on the voice alone.
- No client has ever told me “We’re going to hire AI to do your job.” And if they did, I have enough work and clients coming in to wish them luck and move on.
And those are just a few of the reasons off the top of my head. I’ll go a bit deeper later in this article, right after we flesh out what’s going on with AI in the world of voice over.
AI and Technology in Voice Over
Voice technology has gotten more sophisticated, something I’ve touched on in past blogs. Examples include:
Artificially produced replications of the human voice, such as simple commands on automated messages.
Text-to-speech transforms digital text into human speech. Google’s text-to-speech is a prime example.
Type of synthetic voice that uses deep learning to:
- Transform text into human-sounding speech
- Transform speech into text
- Identify a person by their voice command
Voice assistant software can perform tasks or answer questions in response to a person’s voice. Think Amazon’s Alexa or Apple’s Siri.
Synthetic, On-Demand Voice Overs
Today you can actually find software that produces text-to-speech (TTS) voice overs as needed. The TTS voices are often synthetic voices generated by AI and other technology to help them sound less robotic and more human.
If you’d rather start off with a real voice, other technologies let you create a voice bank of a real person’s voice that can later be used to create synthetic speech. Here the voices originally come from human actors, but the sounds are broken down and then put together to produce the desired order needed for the project.
On the surface, it may appear as if technology is rapidly honing in on the voice over industry, ready to replace the real life actors. And for some uses, perhaps it has. But that doesn’t mean game over.
Downside of Artificial Voices
True, starting costs for using synthetic voices might be lower than hiring a professional voice over actor. You may also get a rapid-fast turnaround time, along with the ability to endlessly manipulate the recording as desired. But the pros have yet to outweigh the cons.
The cons of using synthetic voices include:
- No unique sound. You may be purchasing a synthetic voice that’s used by dozens, or even hundreds or thousands, of other companies. Not only may the voice be mundane and overused, but it also runs the risk of sounding monotonous.
- Limited library. While you may find plenty of voices speaking standard, non-accented English, you’ll likely be at a loss if you need regional accents or a less common language.
- Endless manipulation. The ability to manipulate the recording to add pauses and other elements may be a plus. But you may also find yourself manipulating the recording to fix errors with acronyms, abbreviations, ambiguities, missed cues, bad flow and other issues that make the recording less than stellar.
- Lack of humanness. The lack of humanness is the greatest downside. No matter how advanced artificially reproduced voices may be, the human brain can pick up the difference.
Why the Human Touch (and Sound) Matters
Even though they try to make a variety of different sounding AI voices, there is really no way to direct to the nuance that some clients desire.
When you take the humanness out of voices, you take away what connects us to our audience: human emotion.
Marketing relies heavily on human emotion, as up to 90% of our decisions are based on emotion. In fact, emotional branding has become one of the foremost ways many brands attempt to connect with their audience.
Real voice over actors are still the only way to provide the emotional connection that synthetic voices have yet to master. Hiring a seasoned voice actor also brings on added benefits, such as getting guidance on the script, receiving input based on their expertise, an opportunity to provide vocal direction and feedback on the delivery of the copy, and connecting with a real person to develop a professional relationship.
End of the Digital Era?
Voice technology has gotten better – but it’s still not to the point where it can substitute for a real person with real emotions. And even though advances may continue to be made, experts like author Greg Satell say we’re looking at the end of the digital era.
That doesn’t mean we’ll stop using digital technology. But it does mean we won’t necessarily see the same massive explosion of new technologies that we’ve seen of late.
“We’ve spent the last few decades learning how to move fast,” Satell notes. “Over the next few decades we’re going to have to relearn how to go slow again.”
Slower and more in-depth, using all this technology for meaningful projects that go beyond automating robo calls or shutting off porch lights with a voice command.
“We are awash in nifty gadgets,” Satell writes, “but in many ways we are no better off than we were 30 years ago.”
Voice Over: Greatest Challenges, Greatest Joys
This all brings me to one final thing I noticed as I was thinking and writing about the changes in the voice over industry. It can be summed up with a quote from French writer Jean-Baptiste Alphonse Karr (written in 1849):
“The more things change, the more they stay the same.”
No matter what has changed in the voice over in the way of technology and work methods, two fundamental elements remain the same. One is the greatest challenge and the other is the greatest joy.
- The greatest challenge is still finding and securing work. Finding people who are doing the hiring and getting in front of them. True, you now get in front of them with an email instead of an in-person connection, but you still need a way to stand out from the crowd.
- The greatest joy will always be connecting with clients and delivering exactly what they’re looking for.
Another thing that has stayed the same is the suggestions I give to new talent looking to break into the industry. The most important thing is a good, clean sound – and the willingness to work hard for what you want.
Just because technology has made some things faster and easier doesn’t mean technology does everything for you. People still have to put in the legwork if they want to succeed – in anything.
I feel lucky to have entered the voice over world when I did, as it’s given me a chance to embrace a wide range of different experiences. While I miss the in-person work with other actors, (even though we have ways to live-connect digitally with each other now, via Source Connect, Zoom and other methods), I also love the ability to be around my family working from home. Not to mention living anywhere I want and still being able to get a steady stream of work.
I’d also say it’s actually easier for me to find work today than it was when I first started. I have a lot of fingers in a lot of different pools. I have a variety of auditions coming to me from many sources. And, clients can find me by searching for female voice talent (or something related) online. SEO is a wonderful thing, and optimization of one’s website is very important these days.
And I have a large stable of clients I built up over the years that I stay in touch with. Once again, the human connection comes into play. Like any meaningful career, it’s not only about the work you do but the relationships you develop… Relationships you just can’t get from technology, no matter how fast, cheap or accessible it may be.