With the recent introduction of Large Language Models (LLMs), the field of Artificial Intelligence (AI) has significantly outshined. Though these models have successfully demonstrated incredible performance in tasks like content generation and question answering, there are still certain challenges in answering complicated, open-ended queries that necessitate interaction with other tools or APIs.
Outcome-based systems, where feedback is easily obtained, are effective for simpler tasks, whereas, for more complex problems, a process supervision approach, which involves defining workflows through human-understandable task decompositions, is helpful. These workflows, called LLM agents, use external tools or APIs to carry out multi-step processes and accomplish a purpose. Answering complicated queries by gathering data and crafting a paragraph-long response utilizing a search API is the sample task considered.
Existing models that can answer complex natural language questions requiring multi-step reasoning and the integration of external information encounter failures because of the non-differentiable nature of interactions with external knowledge and also because training them end-to-end to correct these errors is not simple.
To address these challenges, a team of researchers from Google has suggested developing a ReAct-style LLM agent that can think and act in response to outside information. Because of its ability to manage multi-step procedures, the ReAct-style agent can efficiently respond to intricate queries.
The team has presented a ReST-like technique in order to improve performance even more and handle failure scenarios. This technique uses a growing-batch reinforcement learning strategy with AI feedback, allowing for iterative training on prior trajectories. The main aim is to continuously enable the agent to develop and distill itself over time.
The team has shared that a fine-tuned compact model was obtained after just two algorithm runs, starting from a suggested large model. Despite having two orders of magnitude and fewer parameters, the smaller model was able to demonstrate comparable performance on difficult compositional question-answering benchmarks.
The team has summarized their primary contributions as follows.
- A Self-critical ReAct-style agent has been introduced intended for extended question response.
- A proxy evaluation metric for auto-evaluation has been proposed for the agent using the Bamboogle and BamTwoogle datasets.
- The enhanced performance of the agent by iteratively fine-tuning its reasoning traces in the ReST manner has been demonstrated.
- Stepwise AI feedback has been used to improve the agent, negating the necessity for training data with human labels.
- It has been shown that the agent can be effectively reduced to one or two orders of magnitude smaller models using the synthetic data produced during this iterative process, all the while keeping a performance close to that of the instructor agent that had been trained beforehand.
In conclusion, this approach combines an iterative training technique, ReST, with an LLM agent designed in the ReAct manner. Through the incorporation of external knowledge and extensive model fine-tuning with reduced parameterization, this combination can definitely overcome the challenges of answering difficult questions and ultimately improve performance on demanding benchmarks.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 34k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter.
Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning. She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.
🐝 [FREE AI WEBINAR] Google Gemini Pro: Developers Overview: Dec 20 2023, 10 am PST
Source link