Skip to main content

Amazon plays catch-up with new Nova AI models to generate voices and video

Nova Sonic is Amazon’s real-time AI voice answer to Google’s Gemini and OpenAI’s GPT-4o.

Nova Sonic is Amazon’s real-time AI voice answer to Google’s Gemini and OpenAI’s GPT-4o.

acastro_STK103__01
acastro_STK103__01
Amazon’s Nova Sonic logo.
Image: Alex Castro / The Verge
Umar Shakir
is a former news writer for The Verge.

Amazon is showing off new AI technology this week, including its take on a more conversational voice model to better compete with things like Gemini Live and OpenAI’s Advanced Voice Mode and an update to its model that can generate video.

The new Nova Sonic voice model handles real-time speech processing and AI voice generation for conversational applications, Amazon says. Nova Sonic uses a “unified model architecture” that Amazon claims is better than other approaches that interconnect separate models to handle speech recognition, speech-to-text conversion, response generation, and then text-to-audio. Amazon says Nova Sonic can also better detect someone’s tone and deliver more natural responses.

Nova Sonic is available to try through Amazon’s Bedrock developer platform, and the company says it can be used to make things like customer service bots or build AI agents for travel, education, healthcare, and a variety of other industries. “Components” of Nova Sonic are already being used in Amazon’s new Alexa Plus assistant, Amazon’s Rohit Prasad, SVP and head scientist of AGI, told TechCrunch.

As for video, Amazon announced Nova Reel 1.1, which the company says provides quality and latency improvements over 1.0. It also can now keep consistent styles across multiple six-second scenes cut together to create a full video of up to two minutes in length.

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.