Friday, May 9, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

ByteDance AI Research Introduces StemGen: An End-to-End Music Generation Deep Learning Model Trained to Listen to Musical Context and Respond Appropriately

December 18, 2023
in AI Technology
Reading Time: 3 mins read
0 0
A A
0
Share on FacebookShare on Twitter


Music generation using deep learning involves training models to create musical compositions, imitating the patterns and structures found in existing music. Deep learning techniques are commonly used, such as RNNs, LSTM networks, and transformer models. This research explores an innovative approach for generating musical audio using non-autoregressive, transformer-based models that respond to musical context. This new paradigm emphasizes listening and responding, unlike existing models that rely on abstract conditioning. The study incorporates recent advancements in the field and discusses the improvements made to the architecture.

Researchers from SAMI, ByteDance Inc., introduce a non-autoregressive, transformer-based model that listens and responds to musical context, leveraging a publicly available Encodec checkpoint for the MusicGen model. Evaluation employs standard metrics and a music information retrieval descriptor approach, including Frechet Audio Distance (FAD) and Music Information Retrieval Descriptor Distance (MIRDD). The resulting model demonstrates competitive audio quality and robust musical alignment with context, validated through objective metrics and subjective MOS tests.

The research highlights recent strides in end-to-end musical audio generation through deep learning, borrowing techniques from image and language processing. It emphasizes the challenge of aligning stems in music composition and critiques existing models relying on abstract conditioning. It proposes a training paradigm using a non-autoregressive, transformer-based architecture for models that respond to musical context. It introduces two conditioning sources and frames the problem as a conditional generation. Objective metrics, music information retrieval descriptors, and listening tests are necessary for model evaluation.

The method utilizes a non-autoregressive, transformer-based model for music generation, incorporating a residual vector quantizer in a separate audio encoding model. It combines multiple audio channels into a single sequence element through concatenated embeddings. Training employs a masking procedure, and classifier-free guidance is used during token sampling for enhanced audio context alignment. Objective metrics assess model performance, including Fr’echet Audio Distance and Music Information Retrieval Descriptor Distance. Evaluation involves generating and comparing example outputs with real stems using various metrics.

The study evaluates generated models using standard metrics and a music information retrieval descriptor approach, including FAD and MIRDD. Comparison with real stems indicates that the models achieve audio quality comparable to state-of-the-art text-conditioned models and demonstrate strong musical coherence with context. A Mean Opinion Score test involving participants with music training further validates the model’s ability to produce plausible musical outcomes. MIRDD, assessing the distributional alignment of generated and real stems, provides a measure of musical coherence and alignment.

In conclusion, the research conducted can be summarized in below points:

  • The research proposes a new training approach for generative models that can respond to musical context.
  • The approach introduces a non-autoregressive language model with a transformer backbone and two untested improvements: multi-source classifier-free guidance and causal bias during iterative decoding.
  • The models achieve state-of-the-art audio quality by training on open-source and proprietary datasets.
  • Standard metrics and a music information retrieval descriptor approach have validated the state-of-the-art audio quality.
  • A Mean Opinion Score test confirms the model’s capability to generate realistic musical outcomes.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 34k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter.

Hello, My name is Adnan Hassan. I am a consulting intern at Marktechpost and soon to be a management trainee at American Express. I am currently pursuing a dual degree at the Indian Institute of Technology, Kharagpur. I am passionate about technology and want to create new products that make a difference.

🐝 [FREE AI WEBINAR] ‘Building Multimodal Apps with LlamaIndex – Chat with Text + Image Data’ Dec 18, 2023 10 am PST



Source link

Tags: AppropriatelyByteDanceContextDeependtoendGenerationIntroducesLearningListenmodelMusicMusicalResearchRespondStemGenTrained
Previous Post

Wall Street eyes Adobe’s AI-driven growth By Investing.com

Next Post

What are Transformers (Machine Learning Model)?

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
What are Transformers (Machine Learning Model)?

What are Transformers (Machine Learning Model)?

composability as the antidote to overfit • Lea Verou

composability as the antidote to overfit • Lea Verou

Blockchain technology ante enti ? What is Blockchain ? Telugu Version

Blockchain technology ante enti ? What is Blockchain ? Telugu Version

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
A faster, better way to prevent an AI chatbot from giving toxic responses | MIT News

A faster, better way to prevent an AI chatbot from giving toxic responses | MIT News

April 10, 2024
Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

November 20, 2023
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In