Modern robots have the ability to sense their environment and understand human language. However, what they don’t know is often more important than what they do know. Teaching robots to ask for help is crucial in enhancing their safety and efficiency.
A new method has been developed by engineers at Princeton University and Google to teach robots when they are uncertain. This technique involves measuring the fuzziness of human language and using that measurement to determine when robots should seek further directions. For example, if a robot is asked to pick up a bowl from a table with only one bowl, the instruction is clear. But if there are five bowls on the table, the level of uncertainty is much higher, prompting the robot to ask for clarification.
Since tasks are usually more complex than simple commands, engineers utilize large language models (LLMs), such as ChatGPT, to assess uncertainty in complex environments. Although LLMs enable robots to understand human language, their outputs are often unreliable. Anirudha Majumdar, an assistant professor at Princeton and the senior author of a study on this new method, emphasizes the need for LLM-based robots to be aware of their limitations and when to seek assistance.
The system also allows the user of a robot to set a target level of success, which is associated with a specific uncertainty threshold that triggers the robot to ask for help. For instance, a surgical robot may have a lower tolerance for errors compared to a robot that is cleaning a living room.
Allen Ren, the lead author of the study and a graduate student at Princeton, explains that the goal is for the robot to ask for enough help to achieve the desired level of success while minimizing the overall assistance required. Compared to other approaches, this new method achieves high accuracy with reduced dependency on human intervention.
The researchers conducted experiments using a simulated robotic arm and two types of robots at Google facilities in New York City and Mountain View, California. The experiments ranged from sorting toy food items into different categories to placing a bowl in a microwave. By employing a statistical approach called conformal prediction and a user-specified success rate, the algorithm triggered requests for human help when the probabilities of different actions met a certain threshold.
The researchers highlight the significance of robots’ physical limitations in gaining insights that abstract systems cannot provide. While large language models may excel in conversation, they still have to contend with real-world constraints. The collaboration between Ren, Majumdar, and Andy Zeng from Google DeepMind was initiated after Zeng gave a talk at Princeton Robotics Seminar series, and Ren’s interest in calibrating the level of assistance a robot should seek led to the development of this new method.
Ren is now expanding this work to address challenges in active perception for robots, such as predicting the location of objects within a house based on vision and language information. This presents new obstacles in estimating uncertainty and determining when to request help.