Large Language Models (LLMs) and their Application in Natural Language Processing (NLP)
Large Language Models (LLMs) have revolutionized the field of Natural Language Processing (NLP) and the way humans interact with machines. These models have greatly expanded their capabilities in tasks such as question answering, text generation, text summarization, and code completion.
However, LLMs have limitations when it comes to programming, mathematics, the biomedical sciences, and finance. To overcome this, domain-adaptive pretraining methods have been developed to improve LLMs using domain-specific corpora at a lower computation cost.
The Challenge of Catastrophic Forgetting
Post-pretraining, LLMs face a challenge known as catastrophic forgetting, where their initial general abilities deteriorate. This makes it difficult for the models to perform optimally on various tasks. Therefore, a technique that incorporates domain-specific knowledge into LLMs without compromising their overall capabilities is needed.
Introducing Block Expansion for LLMs
A team of researchers has proposed a new post-pretraining technique called block expansion for LLMs. This involves extending Transformer blocks to effectively and efficiently add information to the model without experiencing catastrophic forgetting. Duplicate Transformer blocks are used to grow a pre-trained LLM, maintaining its general capabilities while incorporating domain-specific knowledge.
In this technique, the recently inserted blocks are exclusively fine-tuned using domain-specific corpora. The remaining blocks remain frozen, and zero-initialized linear layers assist in identity mapping. The result is an extended pre-trained model that performs well in both general and domain-specific tasks.
Introducing LLAMA PRO
The researchers have introduced a family of LLAMA PRO models in this study. Through experimentation with code and math corpora, LLAMA PRO-8.3B has been developed. This adaptable foundation model performs exceptionally well in general tasks, programming, and mathematics. Fine-tuning the extended blocks only with fresh corpus data reduces the possibility of catastrophic forgetting, ensuring the model’s flexibility and proficiency in both new and existing knowledge.
LLAMA PRO and its instruction-following equivalent, LLAMA PRO – INSTRUCT, have exhibited superior performance on multiple benchmarks. They have outperformed current open models in the LLaMA family, showcasing the models’ potential for reasoning and handling a variety of tasks as intelligent agents.
Primary Contributions
The primary contributions of this study can be summarized as follows:
- Introduction of the block expansion technique for LLMs, allowing for the incorporation of new information without sacrificing existing capabilities.
- Introduction of flexible models like LLAMA PRO and LLAMA PRO – INSTRUCT, which seamlessly combine programming and natural languages.
- A thorough benchmarking of the LLAMA PRO family on various datasets, including agent-oriented and traditional workloads.
- Demonstration of LLAMA PRO’s superiority and potential in handling complex and diverse applications.
In conclusion, this study provides valuable insights into the interplay between programming and natural languages. It lays the foundation for the development of sophisticated language agents that can function effectively in different settings. The study also highlights the importance of addressing the flaws in LLMs’ learning processes and offers a promising path towards creating more flexible and powerful language models.
For more information, please read the paper. All credit for this research goes to the researchers involved in this project. Don’t forget to follow us on Twitter for more updates. Join our ML SubReddit with 35k+ members, our Facebook Community with 41k+ members, our Discord Channel, and our LinkedIn Group.
If you enjoy our work, you will love our newsletter. Join now to stay updated on the latest AI research and insights.
About the Author
Tanya Malhotra is a final year undergraduate student at the University of Petroleum & Energy Studies, Dehradun. She is pursuing a BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning. Tanya is a Data Science enthusiast with strong analytical and critical thinking skills. She has a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.
🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…