AI Overviews recommended a pizza recipe containing glue, likely due to a misstep in the retrieval process, according to Shah. Relevant does not always mean correct in this case, as the generation process does not question the accuracy of the information.
A RAG system may struggle with conflicting information, such as different versions of a policy handbook, leading to potentially misleading answers. Verberne notes that while large language models generate fluent language, this does not guarantee the accuracy of the information.
Verberne highlights that misinformation is more common in specific topics within large language models’ output, especially in fields like medicine, education, and science.
Google acknowledges that AI Overviews may return incorrect answers due to a lack of high-quality information or matching satirical content for certain queries. The majority of responses are of high quality, with only a small percentage containing potentially harmful content.
Not Just about Bad Training Data
Mitchell’s experience with AI Overviews pointing to the incorrect information about the number of Muslim presidents in the US highlights the system’s ability to generate misinformation even from factually correct sources. Despite using an academic book as a source, the AI system misinterpreted the information and provided an inaccurate response.