Saturday, May 17, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

What Happens When We Train AI on AI-Generated Data?

April 19, 2024
in Data Science & ML
Reading Time: 3 mins read
0 0
A A
0
Share on FacebookShare on Twitter


In the realm of artificial intelligence (AI) and large language models (LLMs), the key requirement for developing generative solutions is finding suitable training data. With the advancement of Generative AI models such as Chat GPT and DALL-E, there is a growing temptation to utilize AI-generated outputs as training data for new AI systems. However, recent research has highlighted the dangerous consequences of this practice, leading to a phenomenon known as “model collapse.” A study published in July 2023 by scientists at Rice and Stanford University concluded that exclusively training AI models on generative AI outputs is not advisable. Their report was titled “Self-consuming generative models go MAD.”

When training an AI model on data generated by other AI models, it ends up learning from a distorted reflection of itself. Similar to the game of “telephone,” each iteration of AI-generated data becomes more corrupted and detached from reality. Researchers have discovered that even a small amount of AI-generated content in the training data can be detrimental to the model, causing its outputs to degrade into nonsensical gibberish quickly. This is because the errors and biases present in the synthetic data get magnified as the model learns from its own generated outputs.

The issue of model collapse is evident across various types of AI models, from language models to image generators. While larger, more powerful models may show some resistance, there is little evidence to suggest they are immune to this problem. As AI-generated content becomes more widespread, future AI models are likely to be trained on a combination of real and synthetic data. This creates an “autophagous” loop where the model’s outputs deteriorate in quality and diversity over successive generations.

Researchers at Rice University and Stanford University conducted a detailed analysis of self-consuming generative image models trained on their own synthetic outputs. They identified three main types of self-consuming loops:

Fully Synthetic Loops: In these loops, models are exclusively trained on synthetic data generated by previous models. It was found that these loops inevitably lead to Model Autophagy Disorder (MAD), with the quality or diversity of generated images progressively decreasing over generations.

Synthetic Augmentation Loops: These loops incorporate a fixed set of real training data along with synthetic data, delaying but not preventing MAD.

Fresh Data Loops: In these loops, each generation of the model has access to new, previously unseen real training data, preventing MAD and maintaining the quality and diversity of generated images over generations.

Prominent figures in the AI industry recently made commitments at the White House to introduce strategies like watermarking to distinguish synthetic data from authentic data. This approach aims to help users identify artificially generated content and address the negative impacts of synthetic data on the internet. Watermarking could serve as a preventive measure against training generative models on AI-generated data, although its effectiveness in tackling MADness requires further investigation.

It is crucial to maintain a balance of real and synthetic content in training data, with proper representation of minority groups. Companies must curate datasets carefully and monitor for signs of degradation to prevent AI systems from becoming biased and unreliable. Responsible data curation and monitoring can guide the development of AI in a grounded direction that serves diverse community needs.

About the Author

\"\"

Ranjeeta Bhattacharya is a senior data scientist at BNY Mellon, with over 15 years of experience in Data Science and Technology consulting roles. She holds degrees in Computer Science, Data Science, and various certifications in these fields, demonstrating a commitment to continuous learning and knowledge sharing.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW





Source link

Tags: AIGenerateddatatrain
Previous Post

What They Do and How You Can Become One

Next Post

Website Architecture Strategies For SEO: Tips For SEO Developers

Related Posts

AI Compared: Which Assistant Is the Best?
Data Science & ML

AI Compared: Which Assistant Is the Best?

June 10, 2024
5 Machine Learning Models Explained in 5 Minutes
Data Science & ML

5 Machine Learning Models Explained in 5 Minutes

June 7, 2024
Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’
Data Science & ML

Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’

June 7, 2024
How to Learn Data Analytics – Dataquest
Data Science & ML

How to Learn Data Analytics – Dataquest

June 6, 2024
Adobe Terms Of Service Update Privacy Concerns
Data Science & ML

Adobe Terms Of Service Update Privacy Concerns

June 6, 2024
Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart
Data Science & ML

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

June 6, 2024
Next Post
Website Architecture Strategies For SEO: Tips For SEO Developers

Website Architecture Strategies For SEO: Tips For SEO Developers

Expert Insights from Khurram Mir

Expert Insights from Khurram Mir

Wiz signs letter of intent to buy Lacework – report

Wiz signs letter of intent to buy Lacework - report

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

November 20, 2023
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In