Why so many collisions with the MariaDB 10.2 RAND() function?(为什么与 MariaDB 10.2 RAND() 函数发生如此多的冲突?)
问题描述
注意:在 Windows Server 2012 R2 Standard 上运行 MariaDB 10.2.27.
Note: Running MariaDB 10.2.27 on Windows Server 2012 R2 Standard.
我想生成用于 MariaDB 的随机整数,所以我一直在尝试 MariaDB RAND() 函数.无论是我的期望和理解太离谱了(绝对有可能!)或者 MariaDB RAND() 函数不是很随机.
I wanted to generate random integers for use in MariaDB so I have been experimenting with the MariaDB RAND() function. Either my expectations & understanding are way off base (definitely possible!) or the MariaDB RAND() function is not very random.
使用 BIGINT(20) 列,我想生成长度不超过 16 位的随机整数,因此我使用了以下 SQL:FLOOR(RAND()*9999999999999999)+1)
.我在循环中使用的确切 SQL 是:
Using a BIGINT(20) column I wanted to generate random integers up to 16 digits in length, so I used this SQL: FLOOR(RAND()*9999999999999999)+1)
. The exact SQL I use, in a loop is:
插入 rnd_test 值 (FLOOR(RAND()*9999999999999999)+1);
表 rnd_test 有一列是 BIGINT(20) 并且是主 ID.
Table rnd_test has a single column which is a BIGINT(20) and is the primary ID.
使用 10^16 个数字池并考虑生日悖论,我预计在生成 10^8 个数字后发生碰撞的可能性约为 50%.显然这有一些差异,但每次我运行插入循环时,我几乎立即开始看到冲突,然后每隔 2000 或 3000 个生成的数字重复一次,有时更频繁.在生成了大约 50,000 个随机数后,我每隔几百个数字就会看到一次冲突.
With a pool of 10^16 numbers and considering the Birthday Paradox I would expect about a 50% chance of a collision after 10^8 numbers generated. Obviously there is some variance to this but every time I run the insert loop I start seeing collisions almost immediately and then repeating every 2000 or 3000 generated numbers, sometimes more often. After ~50,000 randoms have been generated I'm seeing collisions every few hundred numbers.
认为我的理解可能非常不正确,我调整了循环的 vb.net 代码以在本地生成随机数,然后将其插入到 MariaDB 表中.我在例程顶部定义了一个新的 System.Random,然后使用它来生成随机数:
Thinking that maybe my understanding was wildly incorrect I adjusted my looping vb.net code to generate the random locally and then insert that into the MariaDB table. I define a new System.Random at the top of the routine and then use this to generate random numbers:
Dim r As Long = CLng(Math.Floor(rNum.NextDouble() * 9999999999999999)) + 1
通常这会更好,但仍然不如我预期的那么好.它通常会在碰撞发生之前运行大约 100,000 次迭代,然后每生成 10,000 个随机数似乎就有一到两次碰撞.有时一批 10,000 会完全没有任何碰撞.
Generally this works much better but still not quite as well as I would expect. It will usually run for about 100,000 iterations before a collision occurs, and then after that there seems to be one or two collisions per 10,000 randoms generated. Sometimes a batch of 10,000 will go by without any collisions at all.
那么,与 vb.net 函数相比,为什么 MariaDB RAND() 函数的性能如此差?
So, why is the MariaDB RAND() function performing so poorly compared to the vb.net function?
推荐答案
进一步实验 5.6 的 RAND 表明只有 30 位是好的.也就是说,它只有大约十亿个不同的值.
Further experiments with 5.6's RAND indicate that it is good to only 30 bits. That is it has only about one billion distinct values.
向 bugs.mysql.com 和/或 MariaDB 提交错误.
File a bug with bugs.mysql.com and/or MariaDB.
30 位对于大多数应用程序来说已经足够了.在那些它不够好的应用中,大多数人不会注意到它的懦弱.
30 bits is good enough for most applications. Of those applications for which it is not good enough, most people won't notice its wimpiness.
FLOAT
有 24 位精度;DOUBLE
有 53.所以 30 需要 DOUBLE
,但未能填满.
FLOAT
has 24 bits of precision; DOUBLE
has 53. So 30 needs DOUBLE
, but fails to fill it up.
这篇关于为什么与 MariaDB 10.2 RAND() 函数发生如此多的冲突?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:为什么与 MariaDB 10.2 RAND() 函数发生如此多的冲突?
基础教程推荐
- SQL Server 2016更改对象所有者 2022-01-01
- 将数据从 MS SQL 迁移到 PostgreSQL? 2022-01-01
- 在 VB.NET 中更新 SQL Server DateTime 列 2021-01-01
- 如何在 SQL Server 的嵌套过程中处理事务? 2021-01-01
- SQL Server 中单行 MERGE/upsert 的语法 2021-01-01
- ERROR 2006 (HY000): MySQL 服务器已经消失 2021-01-01
- Sql Server 字符串到日期的转换 2021-01-01
- 无法在 ubuntu 中启动 mysql 服务器 2021-01-01
- 使用pyodbc“不安全"的Python多处理和数据库访问? 2022-01-01
- SQL Server:只有 GROUP BY 中的最后一个条目 2021-01-01