Sunday, June 8, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

The German Tank Problem. Estimating your chances of winning the… | by Dorian Drost | Mar, 2024

March 6, 2024
in AI Technology
Reading Time: 5 mins read
0 0
A A
0
Share on FacebookShare on Twitter



Estimating your chances of winning the lottery with sampling

Statistical estimates can be fascinating, can’t they? By just sampling a few instances from a population, you can infer properties of that population such as the mean value or the variance. Likewise, under the right circumstances, it is possible to estimate the total size of the population, as I want to show you in this article.

I will use the example of drawing lottery tickets to estimate how many tickets there are in total, and hence calculate the likelihood of winning. More formally, this means to estimate the population size given a discrete uniform distribution. We will see different estimates and discuss their differences and weaknesses. In addition, I will point you to some other use cases this approach can be used in.

Playing the lottery

Let’s imagine I go to a state fair and buy some tickets in the lottery. As a data scientist, I want to know the probability of winning the main prize, of course. Let’s assume there is just a single ticket that wins the main prize. So, to estimate the likelihood of winning, I need to know the total number of lottery tickets N, as my chance of winning is 1/N then (or k/N, if I buy k tickets). But how can I estimate that N by just buying a few tickets (which are, as I saw, all losers)?

I will make use of the fact, that the lottery tickets have numbers on them, and I assume, that these are consecutive running numbers (which means, that I assume a discrete uniform distribution). Say I have bought some tickets and their numbers in order are [242,412,823,1429,1702]. What do I know about the total number of tickets now? Well, obviously there are at least 1702 tickets (as that is the highest number I have seen so far). That gives me a first lower bound of the number of tickets, but how accurate is it for the actual number of tickets? Just because the highest number I have drawn is 1702, that doesn’t mean that there are any numbers higher than that. It is very unlikely, that I caught the lottery ticket with the highest number in my sample.

However, we can make more out of the data. Let us think as follows: If we knew the middle number of all the tickets, we could easily derive the total number from that: If the middle number is m, then there are m-1 tickets below that middle number, and there are m+1 tickets above that. That is, the total number of tickets would be (m-1) + (m+1) + 1, (with the +1 being the ticket of number m itself), which is equal to 2m-1. We don’t know that middle number m, but we can estimate it by the mean or the median of our sample. My sample above has the (rounded) average of 922, which yields 2*922-1 = 1843. That is, from that calculation the estimated number of tickets is 1843.

That was quite interesting so far, as just from a few lottery ticket numbers, I was able to give an estimate of the total number of tickets. However, you may wonder if that is the best estimate we can get. Let me spoil you right away: It is not.

The method we used has some drawbacks. Let me demonstrate that to you with another example: Say we have the numbers [12,30,88], which leads us to 2*43–1 = 85. That means, the formula suggests there are 85 tickets in total. However, we have ticket number 88 in our sample, so this can not be true at all! There is a general problem with this method: The estimated N can be lower than the highest number in the sample. In that case, the estimate has no meaning at all, as we already know, that the highest number in the sample is a natural lower bound of the overall N.

A better approach: Using even spacing

Okay, so what can we do? Let us think in a different direction. The lottery tickets I bought have been sampled randomly from the distribution that goes from 1 to unknown N. My ticket with the highest number is number 1702, and I wonder, how far away is this from being the highest ticket at all. In other words, what is the gap between 1702 and N? If I knew that gap, I could easily calculate N from that. What do I know about that gap, though? Well, I have reason to assume that this gap is expected to be as big as all the other gaps between two consecutive tickets in my sample. The gap between the first and the second ticket should, on average, be as big as the gap between the second and the third ticket, and so on. There is no reason why any of those gaps should be bigger or smaller than the others, except for random deviation, of course. I sampled my lottery tickets independently, so they should be evenly spaced on the range of all possible ticket numbers. On average, the numbers in the range of 0 to N would look like birds on a power line, all having the same gap between them.

That means I expect N-1702 to equal the average of all the other gaps. The other gaps are 242–0=242, 412–242=170, 823–412=411, 1429–823=606, 1702–1429=273, which gives the average 340. Hence I estimate N to be 1702+340=2042. In short, this can be denoted by the following formula:
N = x + (x-k)/k

Here x is the biggest number observed (1702, in our case), and k is the number of samples (5, in our case). This is just a short form of calculating the average as we just did.

Let’s do a simulation

We just saw two estimates of the total number of lottery tickets. First, we calculated 2*m-1, which gave us 1843, and then we used the more sophisticated approach of x + (x-k)/k and obtained 2042. I wonder which estimation is more correct now? Are my chances of winning the lottery 1/1843 or 1/2042?

To show some properties of the estimates we just used, I did a simulation. I drew samples of different sizes k from a distribution, where the highest number is 2000, and that I did a few hundred times each. Hence we would expect that our estimates also return 2000, at least on average. This is the outcome of the simulation:
Probability densities of the different estimates for varying k. Note that the ground truth N is 2000. Image by author.

What do we see here? On the x-axis, we see the k, i.e. the number of samples we take. For each k, we see the distribution of the estimates based on a few hundred simulations for the two formulas we just got to know. The dark point indicates the mean value of the simulations each, which is always 2000, independent of the k. That is a very interesting point: Both estimates converge to the correct value if they are repeated an infinite number of times.

However, besides the common average, the distributions differ quite a lot. We see, that the formula 2*m-1 has higher variance, i.e. its estimates are far away from the real value more often than for the other formula. The variance has a tendency to decrease with higher k though. This decrease does not always hold perfectly, as this is just as simulation and is still subject to random influences. However, it is quite understandable and expected: The more samples I take, the more precise is my estimation. That is a very common property of statistical estimates.

We also see that the deviations are symmetrical, i.e. underestimating the real value is as likely as overestimating it. For the second approach, this symmetry does not hold: While most of the density is above the real mean, there are more and larger outliers below. How does that come? Let’s retrace how we computed that estimate. We took the biggest number in our sample and added the average gap size to that. Naturally, the biggest number in our sample can only be as big as the biggest number in total (the N that we…



Source link

Tags: chancesDorianDrostEstimatingGermanMarProblemTanktheâWinning
Previous Post

OpenAI pushes back on Musk after lawsuit: Tesla CEO ‘sued us when we started making meaningful progress’

Next Post

How to Buy Bitcoin ETFs?

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
How to Buy Bitcoin ETFs?

How to Buy Bitcoin ETFs?

Marathon Digital Holdings Reports Lower Bitcoin Production in February Due to Maintenance Issues

Marathon Digital Holdings Reports Lower Bitcoin Production in February Due to Maintenance Issues

CrowdStrike and Dell unleash an AI-powered, unified security vision

CrowdStrike and Dell unleash an AI-powered, unified security vision

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
Accenture creates a regulatory document authoring solution using AWS generative AI services

Accenture creates a regulatory document authoring solution using AWS generative AI services

February 6, 2024
Managing PDFs in Node.js with pdf-lib

Managing PDFs in Node.js with pdf-lib

November 16, 2023
Graph neural networks in TensorFlow – Google Research Blog

Graph neural networks in TensorFlow – Google Research Blog

February 6, 2024
13 Best Books, Courses and Communities for Learning React — SitePoint

13 Best Books, Courses and Communities for Learning React — SitePoint

February 4, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In