Friday, May 16, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

This Machine Learning Research from Amazon Introduces BASE TTS: A Text-to-Speech (TTS) Model that Stands for Big Adaptive Streamable TTS with Emergent Abilities

February 27, 2024
in Data Science & ML
Reading Time: 3 mins read
0 0
A A
0
Share on FacebookShare on Twitter


Recent advancements in generative deep learning models have revolutionized fields such as Natural Language Processing (NLP) and Computer Vision (CV). Previously, specialized models with supervised training dominated these domains, but now, a shift towards generalized models capable of performing diverse tasks with minimal explicit guidance is evident.

Large language models (LLMs) in NLP have shown versatility by successfully tackling tasks like question answering, sentiment analysis, and text summarization despite not being specifically designed for them. Similarly, in CV, pre-trained models trained on extensive image-caption pairs have achieved top performance on image-to-text benchmarks and have demonstrated remarkable results in text-to-image tasks. Transformer-based architectures have largely facilitated this progress, which leverages significantly larger datasets than previous models.

A similar trend of advancement was observed in the realm of Speech Processing and Text-to-Speech (TTS). Models now leverage thousands of hours of data to produce speech that is increasingly closer to human-like quality. Until 2022, Neural TTS models were primarily trained on a few hundred hours of audio data, limiting their ability to generalize beyond the training data and expressly render complex and ambiguous texts.

To address this limitation, researchers at Amazon AGI have introduced BASE TTS, a large TTS (LTTS) system trained on approximately 100K hours of public domain speech data. BASE TTS is designed to model the joint distribution of text tokens and discrete speech representations, known as speech codes. These speech codes are crucial as they allow the direct application of methods developed for LLMs. By employing a decoder-only autoregressive Transformer, BASE TTS can capture complex probability distributions of expressive speech, thus improving prosody rendering compared to early neural TTS systems.

Researchers also propose speaker-disentangled speech codes built on a WavLM Self-Supervised Learning (SSL) speech model. These speech codes, which aim to capture only phonemic and prosodic information, outperform baseline quantization methods. They can be decoded into high-quality waveforms using a simple, fast, and streamable decoder, even with a high level of compression.

Their contributions include introducing BASE TTS, the largest TTS model to date, demonstrating how scaling it to larger datasets and model sizes enhances its capability to render appropriate prosody for complex texts, and introducing novel discrete speech representations that outperform existing methods. These advancements represent significant progress in the field of TTS and lay the groundwork for future research and development.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 38k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

You may also like our FREE AI Courses….

Arshad is an intern at MarktechPost. He is currently pursuing his Int. MSc Physics from the Indian Institute of Technology Kharagpur. Understanding things to the fundamental level leads to new discoveries which lead to advancement in technology. He is passionate about understanding the nature fundamentally with the help of tools like mathematical models, ML models and AI.

🚀 LLMWare Launches SLIMs: Small Specialized Function-Calling Models for Multi-Step Automation [Check out all the models]



Source link

Tags: AbilitiesAdaptiveAmazonBaseBigEmergentIntroducesLearningMachinemodelResearchStandsStreamableTexttoSpeechTTS
Previous Post

Facebook Updates: Groups Changes, Ads Features, Influencer Management, and More

Next Post

Level Up Your Career With These 7 Professional Development Tips

Related Posts

AI Compared: Which Assistant Is the Best?
Data Science & ML

AI Compared: Which Assistant Is the Best?

June 10, 2024
5 Machine Learning Models Explained in 5 Minutes
Data Science & ML

5 Machine Learning Models Explained in 5 Minutes

June 7, 2024
Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’
Data Science & ML

Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’

June 7, 2024
How to Learn Data Analytics – Dataquest
Data Science & ML

How to Learn Data Analytics – Dataquest

June 6, 2024
Adobe Terms Of Service Update Privacy Concerns
Data Science & ML

Adobe Terms Of Service Update Privacy Concerns

June 6, 2024
Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart
Data Science & ML

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

June 6, 2024
Next Post
Level Up Your Career With These 7 Professional Development Tips

Level Up Your Career With These 7 Professional Development Tips

BlackRock declares ‘new regime’ in investing

BlackRock declares 'new regime' in investing

How to mitigate the risks of DIY authoritative DNS

How to mitigate the risks of DIY authoritative DNS

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
How To Build A Quiz App With JavaScript for Beginners

How To Build A Quiz App With JavaScript for Beginners

February 22, 2024
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In