Large language models (LLMs) are increasingly valuable for programming and robotics tasks, but they struggle with complex reasoning problems compared to humans. These systems lack the ability to learn new concepts like humans do, leading to difficulties in forming high-level abstractions essential for more sophisticated tasks.
Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have discovered a wealth of abstractions within natural language that can help bridge this gap. In three upcoming papers at the International Conference on Learning Representations, the team demonstrates how everyday language can provide context for language models to create better representations for code synthesis, AI planning, and robotic navigation and manipulation.
Their frameworks, LILO, Ada, and LGA, focus on building libraries of abstractions tailored to their respective tasks. LILO leverages natural language to synthesize, compress, and document code, while Ada explores sequential decision-making for AI agents, and LGA helps robots understand their environments for improved planning.
Each system utilizes a neurosymbolic approach, combining neural networks with logical components to enhance their capabilities. LILO, for example, uses a standard LLM in conjunction with the Stitch algorithm to identify and document abstractions in code, leading to more interpretable and efficient programs.
Similarly, Ada draws inspiration from Ada Lovelace to develop libraries of plans for tasks based on natural language descriptions. By incorporating a language model like GPT-4, Ada outperformed traditional AI models in kitchen and gaming simulations, showcasing a significant improvement in task accuracy.
On the other hand, LGA focuses on guiding robots to interpret their surroundings through natural language descriptions, enabling them to perform tasks more efficiently. By leveraging language models to generate task abstractions, LGA simplifies the process of training robots for complex environments.
While each framework has its limitations, the researchers are exploring ways to enhance their capabilities, such as incorporating more powerful language models and multimodal interfaces. Overall, their work represents a significant advancement in how language models can facilitate complex tasks in programming, AI planning, and robotics.
Source link