Saturday, June 28, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Researchers from Nankai University and ByteDance Introduce ‘ChatAnything’: A Novel AI Framework Dedicated to the Generation of LLM-Enhanced Personas

November 22, 2023
in AI Technology
Reading Time: 4 mins read
0 0
A A
0
Share on FacebookShare on Twitter


A team of researchers from Nankai University and ByteDance introduced a novel framework called ChatAnything, designed to generate anthropomorphized personas for large language model (LLM)-based characters in an online manner. The aim is to create personas with customized visual appearance, personality, and tones based solely on text descriptions. The researchers leverage the in-context learning capability of LLMs to generate personalities using carefully designed system prompts. They propose two innovative concepts: the mixture of voices (MoV) and the mixture of diffusers (MoD) for diverse voice and appearance generation.

MoV employs text-to-speech (TTS) algorithms with pre-defined tones, selecting the most matching one based on user-provided text descriptions. MoD combines text-to-image generation techniques and talking head algorithms to streamline the process of generating talking objects. However, the researchers observe a challenge where anthropomorphic objects generated by current models are often undetectable by pre-trained face landmark detectors, leading to failure in face motion generation. To address this, they incorporate pixel-level guidance during image generation to infuse human face landmarks. This pixel-level injection significantly increases the face landmark detection rate, enabling automatic face animation based on generated speech content.

The paper discusses recent advancements in large language models (LLMs) and their in-context learning capabilities, positioning them at the forefront of academic discussions. The researchers emphasize the need for a framework that generates LLM-enhanced personas with customized personalities, voices, and visual appearances. For personality generation, they leverage the in-context learning capability of LLMs, creating a pool of voice modules using text-to-speech (TTS) APIs. The mixture of voices (MoV) module selects tones based on user text inputs.

The visual appearance of speech-driven talking motions and expressions is addressed using recent talking head algorithms. However, the researchers encounter challenges when using images generated by diffusion models as input for talking head models. Only 30% of images are detectable by state-of-the-art talking head models, indicating a distribution misalignment. To bridge this gap, the researchers propose a zero-shot method, injecting face landmarks during the image generation phase.

The proposed ChatAnything framework comprises four main blocks: LLM-based control module, portrait initializer, mixture of text-to-speech modules, and motion generation module. The researchers incorporated diffusion models, voice changers, and structural control to create a modular and flexible system. To validate the effectiveness of guided diffusion, the researchers created a validation dataset with prompts from different categories. They use a pre-trained face keypoint detector to assess face landmark detection rates, showcasing the impact of their proposed method.

The researchers introduce a comprehensive framework, ChatAnything, for generating LLM-enhanced personas with anthropomorphic characteristics. They address challenges in face landmark detection and propose innovative solutions, presenting promising results in their validation dataset. This work opens avenues for future research in integrating generative models with talking head algorithms and improving the alignment of data distributions.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

\"\"

Pragati Jhunjhunwala is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Kharagpur. She is a tech enthusiast and has a keen interest in the scope of software and data science applications. She is always reading about the developments in different field of AI and ML.

🔥 Join The AI Startup Newsletter To Learn About Latest AI Startups



Source link

Tags: ByteDanceChatAnythingDedicatedFrameworkGenerationIntroduceLLMEnhancedNankaiPersonasResearchersUniversity
Previous Post

“The Crypto Opportunity EVEN BIGGER Than Bitcoin” – Elon Musk 2023 Prediction

Next Post

Future of data science jobs

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
Future of data science jobs

Future of data science jobs

Nick Diakopoulos: Automating the News: How Algorithms Are Rewriting the Media

Nick Diakopoulos: Automating the News: How Algorithms Are Rewriting the Media

Businesses of The Future: This Generations Opportunity

Businesses of The Future: This Generations Opportunity

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
How ‘Chain of Thought’ Makes Transformers Smarter

How ‘Chain of Thought’ Makes Transformers Smarter

May 13, 2024
Amazon’s Bedrock and Titan Generative AI Services Enter General Availability

Amazon’s Bedrock and Titan Generative AI Services Enter General Availability

October 2, 2023
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
The Importance of Choosing a Reliable Affiliate Network and Why Olavivo is Your Ideal Partner

The Importance of Choosing a Reliable Affiliate Network and Why Olavivo is Your Ideal Partner

October 30, 2023
Managing PDFs in Node.js with pdf-lib

Managing PDFs in Node.js with pdf-lib

November 16, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In