Friday, May 16, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Drawing From a Random Distribution in SQL | by Sami Abboud | Feb, 2024

February 9, 2024
in AI Technology
Reading Time: 3 mins read
0 0
A A
0
Share on FacebookShare on Twitter



From a probability density function to random samples
Photo by Moritz Kindler on Unsplash

There are various approaches to updating a reinforcement learning agent’s policy during each iteration. Recently, we began experimenting with replacing our current method with a Bayesian inference step. Some of the data workloads in our agent are written in SQL and executed on GCP’s BigQuery engine. We chose this stack because it offers scalable computational capabilities, ML packages, and a straightforward SQL interface.

The Bayesian inference step we wanted to implement utilizes one of the alternative parametrizations of a beta distribution. This means that we need to be able to draw samples from a beta distribution in SQL. While working on this, I realized that there are very few well-documented examples on how to draw random distributions in SQL. Hence, I am documenting my findings here.

Unfortunately, BigQuery does not have a built-in beta distribution or the capability to draw samples from any random distribution. My initial approach was to define the beta distribution in SQL, set the parameters, generate a random number between 0 and 1, and calculate the value of the function. However, this approach did not work as expected.

To find a solution, I decided to ask ChatGPT for help. I asked it how to create random draws from a beta distribution in BigQuery. However, the code provided by ChatGPT had a flaw. It drew two different x values for the presumed beta distribution probability density function (PDF). I fixed the code, made some query optimizations, and sampled 1,000 values. Here is the updated SQL code:

“`sql
WITH raw_data AS (
SELECT ‘user_a’ AS contact_id, 0.674 AS probability, 72 AS lifetime_messages_received
), parameters AS (
SELECT contact_id, probability * lifetime_messages_received AS alpha, (1.0 – probability) * lifetime_messages_received AS beta, RAND() AS x
FROM raw_data
CROSS JOIN UNNEST(GENERATE_ARRAY(1, 1000)) AS draw_id
)
SELECT contact_id, ARRAY_AGG(POW(x, alpha – 1.0) * POW(1.0 – x, beta – 1)) AS beta_x
FROM parameters
GROUP BY contact_id
“`

However, when I compared the results of the SQL code with a trusted implementation using SciPy’s `beta.rvs()` function in Python, I noticed that the distributions were different. I realized that I had missed including the scaling constant in my SQL calculation. The beta distribution has a scaling constant that depends on the gamma function, which I did not account for.

The problem was that the gamma function does not have a closed-form expression, and BigQuery does not provide an approximation for it. Therefore, I decided to switch to Python for a more efficient experimentation process. The plan was to get the implementation right in Python and then translate it to SQL. However, I still needed a way to approximate the gamma function.

In Python, I implemented a manual draw from a beta distribution using the correct constant with the help of SciPy’s gamma function. I also realized that drawing from a random distribution means sampling from the inverse cumulative distribution function (CDF), not directly from the probability density function (PDF) as I had been doing.

After correcting my approach in Python, I compared the distribution drawn using my manual method with the distribution drawn using SciPy’s `beta.rvs()` function. The two distributions matched, indicating that the manual draw was successful.

Now that I had a working implementation in Python, I could return to SQL. Since BigQuery does not readily provide an implementation of the gamma function, I decided to draw from the logistic distribution instead (with parameters a=0 and b=1) for simplicity. However, you can adjust the parameters based on the PDF support of the distribution you wish to draw from.

I developed SQL code that samples from the logistic distribution using the inverse CDF (quantile function) approach. This approach involves generating a discrete representation of the PDF, computing a discrete CDF, and then drawing random samples using the inverse CDF. The SQL code can be adapted for other distributions where you can obtain a discrete PDF representation by sampling it at consistent intervals.

In conclusion, I have successfully implemented sampling from a random variable in SQL by approximating the inverse CDF. This approach can be used for various distributions, and it offers a way to draw random samples in BigQuery when direct random distribution sampling is not available.



Source link

Tags: AbboudDistributionDrawingfebRandomSamiSQL
Previous Post

SEC’s Misstep in Debt Box Lawsuit Sparks Senate Republican Critique

Next Post

Leader Spotlight: Adopting the right mindset for AI, with Sapna Gulati

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
Leader Spotlight: Adopting the right mindset for AI, with Sapna Gulati

Leader Spotlight: Adopting the right mindset for AI, with Sapna Gulati

NY attorney general expands crypto lawsuit, sees $3 billion fraud By Reuters

NY attorney general expands crypto lawsuit, sees $3 billion fraud By Reuters

Using Figma’s Magician AI to improve designs

Using Figma’s Magician AI to improve designs

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
How To Build A Quiz App With JavaScript for Beginners

How To Build A Quiz App With JavaScript for Beginners

February 22, 2024
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In