Saturday, May 17, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Researchers from Microsoft Research and Georgia Tech Unveil Statistical Boundaries of Hallucinations in Language Models

December 6, 2023
in AI Technology
Reading Time: 4 mins read
0 0
A A
0
Share on FacebookShare on Twitter


A key issue that has recently surfaced in Language Models is the high rate at which Language Models (LMs) provide erroneous information, including references to nonexistent article titles. The Merriam-Webster dictionary defines a hallucination as “a plausible but false or misleading response generated by an artificial intelligence algorithm.” In one instance, attorneys who submitted legal research with imagined court cases they thought to be accurate faced a $5,000 penalty. In the medical field, patients’ hallucinations may be fatal, and doctors worry about being sued for negligence. Additionally, the media has covered hallucinations extensively, and the President of the United States recently issued an Executive Order requesting, among other things, protections against deceptive results from generative artificial intelligence systems. 

In this work, researchers from Microsoft Research and Georgia Tech present statistical lower bounds on the hallucination rate for learning machines (LMs) that are calibrated fact predictors. This sheds light on the characteristics of hallucinations. This does not imply that hallucinations are unavoidable. As the research team will discuss, it is more in line with the growing trend of practitioners supplementing “pretraining” procedures with “post-training” procedures that lower hallucination rates and calibration. An LM is just a probability distribution D over sequences of tokens,i.e., words or other character sequences. Any LM that predicts every string with positive probability (a typical characteristic of LMs) will necessarily hallucinate with positive probability. However, hallucinations will be uncommon if this chance is low. Therefore, measuring the frequency of hallucinations is essential. 

Log-probabilities across complete sequences or conditional log-probabilities of the next token given the preceding ones may be used to express any distribution D identically: log D(t1… tm) = Pm i=1 log D(ti | t1 … ti−1). This seemingly insignificant mathematical equivalency has a significant implication. Although prediction and generation have different requirements, any LM may be used to either produce text or predict the next token in naturally occurring text conditioned on the preceding tokens. Take the following sentence, for example Alexa Wilkins went to Salumeria last Tuesday for lunch because the reviews said the tuna sandwich was amazing. A predictive language model might suggest such sentences to lessen phone typing. It may be beneficial to forecast sandwich as a word to input following the term tuna, along with other plausible words such as salad. 

However, it would be false if a generative LM were to fabricate the vast majority of these kinds of sentences at random. According to this article, even in perfect circumstances, LMs with strong predictive text ability should experience hallucinations. Notably, in the initial step of pretraining, which is typical nowadays, the generative LM is tailored for predictive text performance. Moreover, it offers a lower bound on the rate of hallucination, which might throw insight into the varied rates at which different sorts of facts should be hallucinated. Both the example above and the possible references (which the research team will refer to as 5W = Who-Ate-What-When-Where-Why factoids) have in common that they are arbitrary in the sense that neither can be ascertained methodically by rules; that is, most of these facts cannot be verified because they are not included in the training data. 

As opposed to facts, the validity of which can be methodically ascertained. Even in a simplified situation with many ideal qualities, the research team estimate the number of hallucinations LMs should experience. The research team prefer simplicity over generality since their lower bounds are statistical, and their goal is to pinpoint the underlying source of LM hallucinations. The research team seek a hallucinatory lower-bound that holds in the simplest context when training data is i.i.d. without factual mistakes, similar to classification, where one seeks a lower-bound for the difficulty of classification in noiseless settings (although noise-tolerant classification techniques).

The research team offer a natural extension of calibration to generative models. Their idea is different from previous calibration applications in LMs, which were token-level. Since each fact may be described using natural language in various ways, calibrating token probabilities is only useful when evaluating raw token probabilities. Rather, the probability distribution across the bits of information (facts or hallucinations) in the text is considered by their semantic-level calibration. An LM is considered calibrated if, among the information it creates with probability a ≈ z, for any given probability z ∈ [0, 1], such information appears on average in a fraction of naturally occurring language with probability a ≈ z (preferably the distribution from which training data was collected).

This work aims to explain this phenomenon by demonstrating that, even in an ideal world where the training data is perfectly factual, there is no blurring of facts and hallucinations, each document contains at most one fact, and there is not even a prompt that would encourage hallucination, pretraining LMs for predictive accuracy results in hallucinations. Furthermore, their hypothesis clarifies why contemporary LMs have greater hallucinations than previous LMs, such as trigram models, despite training on comparable data sets with comparable goals. The mono act rate may show the rates at which calibrated LMs must delude themselves for various kinds of facts. 

When facts with a high monofact rate that is, events that frequently appear just once in the training data occur, one predicts hallucinations. It’s interesting to note that this is uncommon for allusions to books or articles a problematic kind of hallucination being studied now. Therefore, examining the sheer quantity of facts, including references and others, that an LM encounters during training may result from other problems like model capacity. Additionally, it could be possible to correct hallucinated references by altering the pretraining pipeline without using post-training, but this won’t help with other kinds of arbitrary facts, like the ones in their 5W example, where the monofacts are frequent.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

\"\"

Source link

Tags: BoundariesGeorgiaHallucinationslanguageMicrosoftmodelsResearchResearchersStatisticalTechUnveil
Previous Post

How to learn data science in 2022 (the minimize effort maximize outcome way)

Next Post

Introducing Chatbots and Large Language Models (LLMs)

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
Introducing Chatbots and Large Language Models (LLMs)

Introducing Chatbots and Large Language Models (LLMs)

Automating a Big Rig Light Show – Library.Automationdirect.com

Automating a Big Rig Light Show – Library.Automationdirect.com

Het belang van Data Literacy op School: Een reis door de wereld van AI

Het belang van Data Literacy op School: Een reis door de wereld van AI

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

November 20, 2023
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In