Saturday, May 17, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Can Large Language Models Retain Old Skills While Learning New Ones? This Paper Introduces LLaMA Pro-8.3B: A New Frontier in AI Adaptability

January 9, 2024
in AI Technology
Reading Time: 3 mins read
0 0
A A
0
Share on FacebookShare on Twitter


Large Language Models (LLMs) and their Application in Natural Language Processing (NLP)

Large Language Models (LLMs) have revolutionized the field of Natural Language Processing (NLP) and the way humans interact with machines. These models have greatly expanded their capabilities in tasks such as question answering, text generation, text summarization, and code completion.

However, LLMs have limitations when it comes to programming, mathematics, the biomedical sciences, and finance. To overcome this, domain-adaptive pretraining methods have been developed to improve LLMs using domain-specific corpora at a lower computation cost.

The Challenge of Catastrophic Forgetting

Post-pretraining, LLMs face a challenge known as catastrophic forgetting, where their initial general abilities deteriorate. This makes it difficult for the models to perform optimally on various tasks. Therefore, a technique that incorporates domain-specific knowledge into LLMs without compromising their overall capabilities is needed.

Introducing Block Expansion for LLMs

A team of researchers has proposed a new post-pretraining technique called block expansion for LLMs. This involves extending Transformer blocks to effectively and efficiently add information to the model without experiencing catastrophic forgetting. Duplicate Transformer blocks are used to grow a pre-trained LLM, maintaining its general capabilities while incorporating domain-specific knowledge.

In this technique, the recently inserted blocks are exclusively fine-tuned using domain-specific corpora. The remaining blocks remain frozen, and zero-initialized linear layers assist in identity mapping. The result is an extended pre-trained model that performs well in both general and domain-specific tasks.

Introducing LLAMA PRO

The researchers have introduced a family of LLAMA PRO models in this study. Through experimentation with code and math corpora, LLAMA PRO-8.3B has been developed. This adaptable foundation model performs exceptionally well in general tasks, programming, and mathematics. Fine-tuning the extended blocks only with fresh corpus data reduces the possibility of catastrophic forgetting, ensuring the model’s flexibility and proficiency in both new and existing knowledge.

LLAMA PRO and its instruction-following equivalent, LLAMA PRO – INSTRUCT, have exhibited superior performance on multiple benchmarks. They have outperformed current open models in the LLaMA family, showcasing the models’ potential for reasoning and handling a variety of tasks as intelligent agents.

Primary Contributions

The primary contributions of this study can be summarized as follows:

  • Introduction of the block expansion technique for LLMs, allowing for the incorporation of new information without sacrificing existing capabilities.
  • Introduction of flexible models like LLAMA PRO and LLAMA PRO – INSTRUCT, which seamlessly combine programming and natural languages.
  • A thorough benchmarking of the LLAMA PRO family on various datasets, including agent-oriented and traditional workloads.
  • Demonstration of LLAMA PRO’s superiority and potential in handling complex and diverse applications.

In conclusion, this study provides valuable insights into the interplay between programming and natural languages. It lays the foundation for the development of sophisticated language agents that can function effectively in different settings. The study also highlights the importance of addressing the flaws in LLMs’ learning processes and offers a promising path towards creating more flexible and powerful language models.

For more information, please read the paper. All credit for this research goes to the researchers involved in this project. Don’t forget to follow us on Twitter for more updates. Join our ML SubReddit with 35k+ members, our Facebook Community with 41k+ members, our Discord Channel, and our LinkedIn Group.

If you enjoy our work, you will love our newsletter. Join now to stay updated on the latest AI research and insights.

About the Author

Tanya Malhotra is a final year undergraduate student at the University of Petroleum & Energy Studies, Dehradun. She is pursuing a BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning. Tanya is a Data Science enthusiast with strong analytical and critical thinking skills. She has a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…



Source link

Tags: AdaptabilityFrontierIntroduceslanguageLargeLearningLlamamodelsPaperPro8.3BretainSkills
Previous Post

The importance of data ingestion and integration for enterprise AI

Next Post

Can Large Language Models Learn New Tricks? This Machine Learning Research from Google Introduces ‘CALM’: A Novel Approach for Enhancing AI Capabilities Through Composition

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
Can Large Language Models Learn New Tricks? This Machine Learning Research from Google Introduces ‘CALM’: A Novel Approach for Enhancing AI Capabilities Through Composition

Can Large Language Models Learn New Tricks? This Machine Learning Research from Google Introduces 'CALM': A Novel Approach for Enhancing AI Capabilities Through Composition

China and cybercriminals are targeting American AI companies

China and cybercriminals are targeting American AI companies

Meta says Instagram, Facebook will hide posts about suicide, self-harm and eating disorders from teenagers’ accounts

Meta says Instagram, Facebook will hide posts about suicide, self-harm and eating disorders from teenagers' accounts

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

November 20, 2023
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In