Conversational agents (CAs) like Alexa and Siri are created to answer questions, provide suggestions, and show empathy. However, recent studies have revealed that they struggle compared to humans in understanding and exploring a user’s experience.
CAs operate using large language models (LLMs) that process vast amounts of human-generated data, making them susceptible to the same biases present in the data sources.
A research team from Cornell University, Olin College, and Stanford University tested this hypothesis by instructing CAs to demonstrate empathy while interacting with or discussing 65 different human identities.
The team discovered that CAs tend to make biased judgments about certain identities, such as those related to being gay or Muslim, and can even show support for identities associated with harmful ideologies like Nazism.
Lead author Andrea Cuadra, now a postdoctoral researcher at Stanford, remarked, “Automated empathy has the potential for significant positive impact in areas like education or healthcare. However, it is crucial to approach it with critical perspectives to mitigate possible harm.”
Cuadra will be presenting “The Illusion of Empathy? Notes on Displays of Emotion in Human-Computer Interaction” at CHI ’24, the Association of Computing Machinery conference on Human Factors in Computing Systems, scheduled for May 11-18 in Honolulu. Co-authors of the research at Cornell University include Nicola Dell, Deborah Estrin, and Malte Jung.
The researchers found that while LLMs excelled in emotional responses, they struggled with interpretations and explorations. In essence, LLMs can answer queries based on their training but lack the ability to delve deeper.
Dell, Estrin, and Jung were inspired to explore this topic as Cuadra studied the use of earlier-generation CAs among older adults.
Estrin explained, “Cuadra observed instances of the conflicting nature of ’empathy’ in the technology, from its engaging to unsettling aspects, during her investigation into its applications for tasks like frailty health assessments and reminiscence experiences.”
This research was funded by the National Science Foundation, a Cornell Tech Digital Life Initiative Doctoral Fellowship, a Stanford PRISM Baker Postdoctoral Fellowship, and the Stanford Institute for Human-Centered Artificial Intelligence.