Why do people say there is modulo bias when using a random number generator?(为什么人们说使用随机数生成器时存在模偏差?)
问题描述
I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly there is "modulo bias" when using a random number generator, like rand()
in C++.
So rand()
is a pseudo-random number generator which chooses a natural number between 0 and RAND_MAX
, which is a constant defined in cstdlib
(see this article for a general overview on rand()
).
Now what happens if you want to generate a random number between say 0 and 2? For the sake of explanation, let's say RAND_MAX
is 10 and I decide to generate a random number between 0 and 2 by calling rand()%3
. However, rand()%3
does not produce the numbers between 0 and 2 with equal probability!
When rand()
returns 0, 3, 6, or 9, rand()%3 == 0
. Therefore, P(0) = 4/11
When rand()
returns 1, 4, 7, or 10, rand()%3 == 1
. Therefore, P(1) = 4/11
When rand()
returns 2, 5, or 8, rand()%3 == 2
. Therefore, P(2) = 3/11
This does not generate the numbers between 0 and 2 with equal probability. Of course for small ranges this might not be the biggest issue but for a larger range this could skew the distribution, biasing the smaller numbers.
So when does rand()%n
return a range of numbers from 0 to n-1 with equal probability? When RAND_MAX%n == n - 1
. In this case, along with our earlier assumption rand()
does return a number between 0 and RAND_MAX
with equal probability, the modulo classes of n would also be equally distributed.
So how do we solve this problem? A crude way is to keep generating random numbers until you get a number in your desired range:
int x;
do {
x = rand();
} while (x >= n);
but that's inefficient for low values of n
, since you only have a n/RAND_MAX
chance of getting a value in your range, and so you'll need to perform RAND_MAX/n
calls to rand()
on average.
A more efficient formula approach would be to take some large range with a length divisible by n
, like RAND_MAX - RAND_MAX % n
, keep generating random numbers until you get one that lies in the range, and then take the modulus:
int x;
do {
x = rand();
} while (x >= (RAND_MAX - RAND_MAX % n));
x %= n;
For small values of n
, this will rarely require more than one call to rand()
.
Works cited and further reading:
CPlusPlus Reference
Eternally Confuzzled
这篇关于为什么人们说使用随机数生成器时存在模偏差?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:为什么人们说使用随机数生成器时存在模偏差?
基础教程推荐
- 您如何将 CreateThread 用于属于类成员的函数? 2021-01-01
- C++ 标准:取消引用 NULL 指针以获取引用? 2021-01-01
- 如何定义双括号/双迭代器运算符,类似于向量的向量? 2022-01-01
- C++,'if' 表达式中的变量声明 2021-01-01
- 什么是T&&(双与号)在 C++11 中是什么意思? 2022-11-04
- 运算符重载的基本规则和习语是什么? 2022-10-31
- 如何在 C++ 中处理或避免堆栈溢出 2022-01-01
- C++ 程序在执行 std::string 分配时总是崩溃 2022-01-01
- 设计字符串本地化的最佳方法 2022-01-01
- 调用std::Package_TASK::Get_Future()时可能出现争用情况 2022-12-17