In a very short and bizarre demonstration, Amazon showed how Alexa can mimic the voice of a deceased relative to read bedtime stories or perform other tasks involving “human empathy.” The feature is still experimental, but according to Amazon, Alexa only needs a few minutes of audio to mimic someone’s voice.
The demonstration was tucked in the middle of Amazon’s annual re: MARS conference, an industry gathering that focuses on machine learning, space exploration, and other intoxicating things. In it, a young child asks Alexa if Grandma can read The Wizard of Oz— the speaker responds accordingly using a synthesized voice.
“Instead of Alexa’s voice reading the book, it’s the child’s grandmother’s voice,” Rohit Prasad, Amazon’s chief scientist for Alexa AI, told a silent crowd after the demo.
Prasad points out that “so many of us have lost someone we love” to the pandemic, claiming that AI speech synthesis can “make their memories stay.” This is clearly a controversial idea – it’s morally questionable, we don’t know how it could affect mental health, and we’re not sure how far Amazon wants to push the technology. (I mean, can I use a deceased relative’s voice for GPS navigation? What’s the point here?)
Amazon’s advanced speech synthesis technology is also worrying. Previously, Amazon duplicated the voices of celebrities such as Shaquille O’Neal using a couple of hours of professionally recorded content. But the company now claims it can copy a voice with just a few minutes of audio. We’ve already seen how speech synthesis technology can help with fraud and theft, so what happens next?
We don’t know if Amazon will ever introduce this speech synthesis feature on its smart speakers. But audio deepfakes are basically unavoidable. They are already a big part of the entertainment industry (see Top Gun: Maverick for example), and Amazon is just one of many companies trying to clone voices.
Source: Amazon via The Verge