Outperforming and boosting large multi-task language models with a small scorer

Outperforming and boosting large multi-task language models with a small scorer – Google Research Blog

In a blog post by Yun Zhu and Lijuan Liu, both Software Engineers at Google Research, they discuss the advancements in Large Language Models (LLMs) and how they have led to a new paradigm that unifies various natural language processing (NLP) tasks within an instruction-following framework. This new paradigm is exemplified by recent multi-task LLMs such as T0, FLAN, and OPT-IML.

The process begins with the gathering of multi-task data, where each task follows a task-specific template. Each labeled example is converted into an instruction paired with a corresponding response. These instruction-response pairs are used to train the LLM, resulting in a conditional generation model that takes an instruction as input and generates a response.

Multi-task LLMs have shown remarkable task-wise generalization capabilities, allowing them to address unseen tasks by understanding and solving brand-new instructions. The demonstration of instruction-following pre-training of multi-task LLMs, like FLAN, has shown improved performance for unseen tasks.

Due to the complexity of understanding and solving various tasks solely using instructions, multi-task LLMs typically have a large number of parameters, ranging from several billion to hundreds of billions. Operating such sizable models poses challenges as they require significant computational power and memory capacities, making training and inference expensive and inefficient.

To address these challenges, the engineers propose a novel approach called Cappy. Cappy is a lightweight pre-trained scorer with only 360 million parameters. It takes an instruction and a candidate response as input and produces a score between 0 and 1, indicating the estimated correctness of the response with respect to the instruction. Cappy can function independently on classification tasks or serve as an auxiliary component for LLMs, boosting their performance.

Cappy enables downstream supervision without requiring fine-tuning, reducing memory requirements and avoiding the need for back-propagation through LLM parameters. It can be adapted with closed-source multi-task LLMs and is compatible with WebAPIs.

The engineers conducted pre-training on Cappy using a diverse dataset collection and a pre-training data instance that includes an instruction-response pair with a correctness annotation. They used Rouge-L as a metric for measuring similarity between responses for weak supervision. The continuous pre-training of Cappy on top of the RoBERTa model was conducted on Google’s TPU-v4.

Cappy can be applied to solve practical tasks within a candidate-selection mechanism, providing scores for candidate responses based on the given instruction. It can be fine-tuned to integrate downstream task information into LLM predictions, boosting performance on downstream tasks.

Overall, Cappy with its scoring-based pre-training strategy has shown to outperform existing multi-task LLMs in terms of performance and parameter efficiency. It provides a new approach to adapting LLMs with downstream supervision, reducing memory requirements and improving overall performance on complex tasks.

Source link