A new generative engine and three voices are now generally available on Amazon Polly

Today, we are pleased to announce the general availability of the generative engine of Amazon Polly featuring three voices: Ruth and Matthew in American English and Amy in British English. This new generative engine has been trained with a mix of publicly available and proprietary data, encompassing various voices, languages, and styles. It boasts the highest precision in rendering context-dependent prosody, pausing, spelling, dialectal properties, foreign word pronunciation, and more.

Amazon Polly is an advanced machine learning (ML) service that seamlessly converts text into natural-sounding speech, known as text-to-speech (TTS) technology. With Amazon Polly, users can now access high-quality, humanlike voices in multiple languages, allowing for the selection of the ideal voice for different locales and countries to enhance speech-enabled applications.

Amazon Polly offers a range of voice options, including neural, long-form, and generative voices, all of which deliver significant improvements in speech quality, delivering highly expressive, emotionally adept voices. Users can customize features such as speech rate, pitch, and volume using Speech Synthesis Markup Language (SSML) tags, and enjoy fast response times for lifelike voices and engaging user experiences.

The new generative engine in Amazon Polly now supports four voice engines: standard, neural, long-form, and generative voices.

Standard TTS voices, introduced in 2016, utilize traditional concatenative synthesis, stringing together phonemes of recorded speech to produce natural-sounding synthesized speech. However, variations in speech and segmentation techniques limit the quality of speech.

Neural TTS (NTTS) voices, introduced in 2019, employ a sequence-to-sequence neural network to convert phonemes into spectrograms and a neural vocoder for generating audio signals, resulting in even higher quality humanlike voices.

Long-form voices, introduced in 2023, utilize cutting-edge deep learning TTS technology to captivate listeners’ attention for longer content like news articles, training materials, and marketing videos.

In February 2024, Amazon introduced the Big Adaptive Streamable TTS with Emergent abilities (BASE) model, enabling the generative engine in Amazon Polly to create humanlike synthetically generated voices for use in various applications.

Here are the new generative voices:

Name
Locale
Gender
Language
Sample prompt
NTTS voices
Generative voices

Ruth
en_US
Female
English (US)
Selma was lying on the ground halfway down the steps. ‘Selma! Selma!’ we shouted in panic.

Matthew
en_US
Male
English (US)
The guards were standing outside with some of our neighbours, listening to a transistor radio. ‘Any good news?’ I asked. ‘No, we’re listening to the names of people who were killed yesterday,’ Bruno replied.

Amy
en_GB
Female
English (British)
What are you looking at?’ he said as he stood over me. They got off the bus and started searching the baggage compartment. The tension on the bus was like a dark, menacing cloud that hovered above us.

You can select from these voice options to suit your application and use case. For more information on the generative engine, refer to the Generative voices section in the AWS documentation.

To get started with using generative voices, access the new voices via the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SDKs.

To begin, go to the Amazon Polly console in the US (N. Virginia) Region and navigate to the Text-to-Speech menu in the left pane. Choose the voice of Ruth or Matthew in English (US) or Amy in English (UK) to access the Generative engine. Input your text, listen to, or download the generated voice output.

Using the CLI, you can list the voices that utilize the new generative engine:

$ aws polly describe-voices –output json –region us-east-1 \\
| jq -r ‘.Voices[] | select(.SupportedEngines | index(“generative”)) | .Name’

Matthew
Amy
Ruth

Now, run the synthesize-speech CLI command to synthesize sample text into an audio file (hello.mp3) using the generative engine and a supported voice ID.

$ aws polly synthesize-speech –output-format mp3 –region us-east-1 \\
–text “Hello. This is my first generative voices!” \\
–voice-id Matthew –engine generative hello.mp3

For more code examples using AWS SDKs, visit the Code and application examples section in the AWS documentation. Explore Java and Python code examples, application examples for web applications in Java or Python, as well as iOS and Android applications.

The new generative voices of Amazon Polly are now available in the US East (N. Virginia) Region. Pay only for what you use based on the number of characters converted to speech. Learn more on the Amazon Polly Pricing page.

Try out the new generative voices in the Amazon Polly console today and provide feedback to AWS re:Post for Amazon Polly or through your usual AWS Support contacts.

— Channy

Source link