Create constrained random numbers?(创建受约束的随机数?)
问题描述
清理文本:
我如何创建 m=5 的随机数加起来,比如说 n=100.但是,第一个随机数是说, 10 <x1<30,第二个随机数nr为5<x2<20,第三个随机nr为10<x3<25,等等.所以这五个随机数加起来是 100.我怎样才能创建这五个受约束的数字?
.
[[
相关问题A1):创建5个随机数加起来为100的标准方法,是在[0,100]之间抽取4个数,加上0和100的边界,然后对这6个数[0,x1]进行排序,x2,x3,x4,100].我寻找的五个随机数是增量.也就是说,
100 - x[4] = delta 5x[4]- x[3] = delta 4x[3]- x[2] = delta 3x[2]- x[1] = delta 2x[1] - 0 = 增量 1这五个增量现在加起来为 100.例如,它们可能是 0、1、2、7、90.下面是一些解决这个问题的代码:
total_sum = 100n = 5v = numpy.random.multinomial(total_sum, numpy.ones(n)/n)]]
.
对于我的问题,我不能允许出现很宽的间隔,上面的最大点差是 90-7 = 83,这太宽了.所以,我必须指定一个更紧密的传播,比如 [10,30].这意味着最大的随机数是 30,这不允许像 83 这样的大点差.
.
[[
相关问题 A2):创建具有 相同 边界的五个数字的部分解决方案,10 100 - x[4] = delta 5 + 10x[4]- x[3] = delta 4 + 10x[3]- x[2] = delta 3 + 10x[2]- x[1] = delta 2 + 10x[1] - 0 = delta 1 + 10基本上,我和A1中的完全一样)但不是从0开始,而是从10开始.因此,每个数字都有下限10,但没有上限,它可以很大,也可以太大.如何将上限限制为 30?这里的问题是如何限制上限
]]
.
总结一下,我尝试解决的问题类型如下所示:我需要五个随机数加起来为 100,我需要为每个数字分别指定边界,比如第一个随机数 [10,30]数字,然后 [5,10] 为第二个随机数,[15,35] 为第三个随机数,依此类推.它们加起来必须是 100.
但我使用的真实数据有 ~100 个数字 x_i (m=50),所有这些数字加起来就是 ~400,000.对于数字 x_i,范围通常为 [3000,5000].这些数字并不准确,我只是想传达一些有关问题大小的信息.目的是进行 MCMC 模拟,因此需要快速生成这些数字.人们提出了非常优雅的解决方案,它们确实有效,但它们花费的时间太长,所以我无法使用它们.问题仍然没有解决.理想情况下,我想要 O(m) 解决方案和 O(1) 内存解决方案.
这个问题应该不是NP-hard,感觉不像.应该有多项式时间解吧?
解决方案 假设你需要 [10,30] 中的 n_1、[20,40] 中的 n_2、[30,50] 中的 n_3 和 n1+n2+n3=90
如果您需要每个可能的三元组 (n_1, n_2, n_3) 都具有相同的可能性,那将会很困难.(20, n_2, n_3) 形式的三元组数大于 (10, n_2, n_3) 形式的三元组数,所以不能统一选择 n_1.
令人难以置信的缓慢但准确的方法是生成正确范围内的所有 5 个随机数,如果总和不正确则拒绝整个组.
...啊哈!
我找到了一种有效地参数化选择的方法.不过,首先,为了简单起见,请注意下限的总和是可能的最小总和.如果从目标数字中减去下限的总和,并从每个生成的数字中减去下限,则会出现每个数字都在区间 [0, max_k-min_k] 中的问题.这简化了数学和数组(列表)处理.令 n_k 为基于 0 的选择,其中 0<=n_k<=max_k-min_k.
总和的顺序是按字典顺序排列的,所有总和首先以 n_1=0(如果有)开头,然后是 n_1==1 总和,依此类推.总和在每个组中按 n_2 排序,然后按 n_3 排序,然后很快.如果您知道有多少和添加到目标(称为 T),以及有多少和以 n_1=0, 1, 2, ... 开头,那么您可以在该列表中找到和数 S 的起始数 n1.然后你可以将问题简化为添加 n_2+n_3+... 得到 T-n_1,找到总和数 S -(以小于 n_1 的数字开头的原始总和数).
令pulse(n) 成为n+1 个的列表:(n+1)*[1] 在Python 中.设 max_k,min_k 为第 k 个选择的限制,m_k = max_k-min_k 为基于 0 的选择的上限.然后有 1+m_1 个不同的和";从第一个数字的选择开始,pulse(m_k) 给出了分布:1 是使每个总和从 0 到 m_1.对于前两个选择,有 m_1+m_+1 个不同的和.结果表明,pulse(m_1) 与pulse(m_2) 的卷积给出了分布.
是时候停下来看看代码了:
def 脉冲(宽度,值=1):''' 返回一个由 (width+1) 个整数组成的向量.'''返回(宽度+1)*[值]def stepconv(向量,宽度):''' 用一个单位"计算向量的离散卷积给定宽度的脉冲.公式:result[i] = Sum[j=0 to width] 1*vector[i-j]其中 0 <= i <= len(vector)+width-1,并且1*"为是价值隐含的单位脉冲函数:pulse[j] = 1 for 0<=j<=width.'''结果 = 宽度*[0] + 向量;对于范围内的我(len(向量)):结果[i] = sum(result[i:i+width+1])对于范围内的 i(len(vector), len(result)):结果[i] = sum(result[i:])返回结果
这是专门为仅使用脉冲"进行卷积而编码的;数组,所以卷积中的每一个线性组合都只是一个和.
那些只在最终类解决方案的构造函数中使用:
class ConstrainedRandom(object):def __init__(self,ranges=None,target=None,seed=None):self._rand = random.Random(seed)如果范围 != None: self.setrange(ranges)如果目标 != 无:self.settarget(target)def setrange(self, range):self._ranges = 范围self._nranges = len(self._ranges)self._nmin, self._nmax = zip(*self._ranges)self._minsum = sum(self._nmin)self._maxsum = sum(self._nmax)self._zmax = [y-x for x,y in self._ranges]self._rconv = self._nranges * [无]self._rconv[-1] = 脉冲(self._zmax[-1])对于范围内的 k(self._nranges-1, 0, -1):self._rconv[k-1] = stepconv(self._rconv[k], self._zmax[k-1])def settarget(self, target):self._target = 目标def next(self, target=None):k = target if target != None else self._targetk = k - self._minsum;N = self._rconv[0][k]seq = self._rand.randint(0,N-1)结果 = self._nranges*[0]对于范围内的 i(len(result)-1):cv = self._rconv[i+1]r_i = 0而 k >= len(cv):r_i += 1k -= 1而 cv[k] <= seq:序列 -= cv[k]r_i += 1k -= 1结果[i] = r_i结果[-1] = k # t返回 [x+y for x,y in zip(result, self._nmin)]# end clss ConstrainedRandom
将其用于:
ranges = [(low, high), (low, high), ...]cr = ConstrainedRandom(范围,目标)seq = cr.next();打印(序列)断言 sum(seq)==targetseq = cr.next();# 得到然后得到下一个.
...等等.该类可以稍微减少一点,但主要的空间开销在 _rconv 列表中,该列表具有存储的卷积.对于 O(NT) 存储,这大约是 N*T/2.
卷积仅使用范围,在相同约束下生成了大量随机数,表构建时间摊销"了到零.就_rconv 列表中的索引数量而言,.next() 的时间复杂度平均约为 T/2 和 O(T).
<小时>要查看算法的工作原理,请假设有 3 个从零开始的选择的序列,最大值为 (5,7,3),并且基于 0 的目标 T=10.在空闲会话中定义或导入脉冲和 stepconv 函数,然后:
<预><代码>>>>脉冲(5)[1, 1, 1, 1, 1, 1]>>>K1 = 脉冲 (5)>>>K2 = stepconv(K1, 7)>>>K3 = stepconv(K2, 3)>>>K1[1, 1, 1, 1, 1, 1]>>>K2[1, 2, 3, 4, 5, 6, 6, 6, 5, 4, 3, 2, 1]>>>K3[1, 3, 6, 10, 14, 18, 21, 23, 23, 21, 18, 14, 10, 6, 3, 1]>>>K3[10]18>>>总和(K3)192>>>(5+1)*(7+1)*(3+1)192
K3[i] 显示了不同选择 n_1、n_2、n_3 的数量,使得 0 <= n_k <= m_k 和 Σ n_k = i.当应用于其中两个列表时,让 * 表示卷积.然后pulse(m_2)*pulse(m_3)给出n_2和n_3之和的分布:
<预><代码>>>>R23 = stepconv(pulse(7),3)>>>R23[1, 2, 3, 4, 4, 4, 4, 4, 3, 2, 1]>>>镜头(R23)11
从 0 到 T=10 的每个值都是(几乎)可能的,因此对于第一个数字,任何选择都是可能的,并且有 R23[T-n_1] 个可能的三元组添加到 T=10 以 N1 开头.所以,一旦你发现有 18 个可能的和加到 10,生成一个随机数 S = randint(18) 并通过 R23[T:T-m_1-1:-1] 数组倒计时:
<预><代码>>>>R23[10:10-5-1:-1][1, 2, 3, 4, 4, 4]>>>总和(R23[10:10-5-1:-1])18
请注意,该列表的总和是上面 K3[10] 中计算的总和.健全性检查.不管怎样,如果 S==9 是随机选择,那么找出可以在不超过 S 的情况下对该数组的前导项求和.这就是 n_1 的值.在这种情况下,1+2+3 <= S 但 1+2+3+4 >S,所以 n_1 是 3.
如上所述,然后您可以将问题归约以找到 n_2.最终的数字(本例中为 n_3)将被唯一确定.
CLEANED UP TEXT:
How can I create m=5 random numbers that add upp to, say n=100. But, the first random number is say, 10 < x1 < 30, the second random nr is 5 < x2 < 20, the third random nr is 10 < x3 < 25, etc. So these five random numbers add up to 100. How can I create these constrained five numbers?
.
[[
Related problem A1): The standard way to create five random numbers that add up to 100, is to sample four numbers between [0,100], and add the boundaries 0 and 100, and then sort these six numbers [0,x1,x2,x3,x4,100]. The five random numbers I seek, are the deltas. That is,
100 - x[4] = delta 5
x[4]- x[3] = delta 4
x[3]- x[2] = delta 3
x[2]- x[1] = delta 2
x[1] - 0 = delta 1
These five deltas will now add up to 100. For instance, they might be 0,1,2,7,90. Here is some code that solves this problem:
total_sum = 100
n = 5
v = numpy.random.multinomial(total_sum, numpy.ones(n)/n)
]]
.
For my problem, I can not allow wide intervals to occur, the largest spread above is 90-7 = 83 which is too wide. So, I have to specify a tighter spread, say [10,30]. This means the largest random number is 30, which disallows large spreads such as 83.
.
[[
Related problem A2): A partial solution to create five numbers with identical boundaries, 10 < x_i < 30, that adds up to 100 is like this: Just do like in A1) but add the lower boundary 10, to the deltas. So I get the five random numbers that I seek like this:
100 - x[4] = delta 5 + 10
x[4]- x[3] = delta 4 + 10
x[3]- x[2] = delta 3 + 10
x[2]- x[1] = delta 2 + 10
x[1] - 0 = delta 1 + 10
Basically, I do exactly like in A1) but do not start from 0, but start from 10. Thus, each number has the lower boundary 10, but they dont have an upper boundary, it can be large, too large. How to limit the upper boundary to 30? Here the problem is how to limit the upper boundary
]]
.
To recapitulate, the type of the problem I try to solve looks like this: I need five random numbers adding up to 100 and I need to specify the boundaries separately for each number, say [10,30] for the first random number, and then [5,10] for the second random number, and [15,35] for the third random number, etc. And they must all add up to 100.
But the real data I am using, has ~100 numbers x_i (m=50), all of them adding up to say ~400,000. And the range is typically [3000,5000] for a number x_i. These numbers are not really accurate, I am only trying to convey something about the problem size. The purpose is to do a MCMC simulation so these numbers need to be quickly generated. People have suggested very elegant solutions that really do work, but they take too long time, so I can not use them. The problem is still unsolved. Ideally I would like an O(m) solution and O(1) memory solution.
This problem should not be NP-hard, it doesnt feel like it. There should be a polynomial time solution, right?
Suppose you need n_1 in [10,30], n_2 in [20,40], n_3 in [30,50] and n1+n2+n3=90
If you need each possible triplet (n_1, n_2, n_3) to be equally-likely, that's going to be difficult. The number of triples of the form (20, n_2, n_3) is greater than the number of triples (10, n_2, n_3), so you can't just pick n_1 uniformly.
The incredibly slow but accurate way is to generate the all 5 randoms in the correct ranges and reject the whole group if the sum is not correct.
. . . AHA!
I found a way to parametrize the choice effectively. First, though, for simplicity note that the sum of the low bounds is the minimum possible sum. If subtract the sum of the low bounds from the target number and subtract the low bound from each generated number, you get a problem where each number is in the interval [0, max_k-min_k]. That simplifies the math and array (list) handling. Let n_k be the 0-based choice with 0<=n_k<=max_k-min_k.
The order of the sums is lexicographic, with all sums beginning with n_1=0 (if any) first, then n_1==1 sums, etc. Sums are sorted by n_2 in each of those groups, then by n_3, and so on. If you know how many sums add to the target (call that T), and how many sums start with n_1=0, 1, 2, ... then you can find the starting number n1 of sum number S in in that list. Then you can reduce the problem to adding n_2+n_3+... to get T-n_1, finding sum number S - (number original sums starting with number less than n_1).
Let pulse(n) be a list of n+1 ones: (n+1)*[1] in Python terms. Let max_k,min_k be the limits for the k'th choice, and m_k = max_k-min_k be the upper limit for 0-based choices. Then there are 1+m_1 different "sums" from the choice of the first number, and pulse(m_k) gives the distribution: 1 was to make each sum from 0 to m_1. For the first two choices, there are m_1+m_+1 different sums. It turns out that the convolution of pulse(m_1) with pulse(m_2) gives the distribution.
Time to stop for some code:
def pulse(width, value=1):
''' Returns a vector of (width+1) integer ones. '''
return (width+1)*[value]
def stepconv(vector, width):
''' Computes the discrete convolution of vector with a "unit"
pulse of given width.
Formula: result[i] = Sum[j=0 to width] 1*vector[i-j]
Where 0 <= i <= len(vector)+width-1, and the "1*" is the value
of the implied unit pulse function: pulse[j] = 1 for 0<=j<=width.
'''
result = width*[0] + vector;
for i in range(len(vector)):
result[i] = sum(result[i:i+width+1])
for i in range(len(vector), len(result)):
result[i] = sum(result[i:])
return result
That's coded specifically for only doing convolutions with a "pulse" array, so every linear combination in the convolution is just a sum.
Those are used only in the constructor of the final class solution:
class ConstrainedRandom(object):
def __init__(self, ranges=None, target=None, seed=None):
self._rand = random.Random(seed)
if ranges != None: self.setrange(ranges)
if target != None: self.settarget(target)
def setrange(self, ranges):
self._ranges = ranges
self._nranges = len(self._ranges)
self._nmin, self._nmax = zip(*self._ranges)
self._minsum = sum(self._nmin)
self._maxsum = sum(self._nmax)
self._zmax = [y-x for x,y in self._ranges]
self._rconv = self._nranges * [None]
self._rconv[-1] = pulse(self._zmax[-1])
for k in range(self._nranges-1, 0, -1):
self._rconv[k-1] = stepconv(self._rconv[k], self._zmax[k-1])
def settarget(self, target):
self._target = target
def next(self, target=None):
k = target if target != None else self._target
k = k - self._minsum;
N = self._rconv[0][k]
seq = self._rand.randint(0,N-1)
result = self._nranges*[0]
for i in range(len(result)-1):
cv = self._rconv[i+1]
r_i = 0
while k >= len(cv):
r_i += 1
k -= 1
while cv[k] <= seq:
seq -= cv[k]
r_i += 1
k -= 1
result[i] = r_i
result[-1] = k # t
return [x+y for x,y in zip(result, self._nmin)]
# end clss ConstrainedRandom
Use that with:
ranges = [(low, high), (low, high), ...]
cr = ConstrainedRandom(ranges, target)
seq = cr.next();
print(seq)
assert sum(seq)==target
seq = cr.next(); # get then get the next one.
...etc. The class could be trimmed down a bit, but the main space overhead is in the _rconv list, which has the stored convolutions. That's roughly N*T/2, for O(NT) storage.
The convolutions only use the ranges, with a lot of randoms generated with the same constraints, the table construction time "amortizes away" to zero. The time complexity of .next() is roughly T/2 on average and O(T), in terms of the number of indexes into the _rconv lists.
To see how the algorithm works, assume a sequence of 3 zero-based choices, with max values (5,7,3), and a 0-based target T=10. Define or import the pulse and stepconv functions in an Idle session, then:
>>> pulse(5)
[1, 1, 1, 1, 1, 1]
>>> K1 = pulse (5)
>>> K2 = stepconv(K1, 7)
>>> K3 = stepconv(K2, 3)
>>> K1
[1, 1, 1, 1, 1, 1]
>>> K2
[1, 2, 3, 4, 5, 6, 6, 6, 5, 4, 3, 2, 1]
>>> K3
[1, 3, 6, 10, 14, 18, 21, 23, 23, 21, 18, 14, 10, 6, 3, 1]
>>> K3[10]
18
>>> sum(K3)
192
>>> (5+1)*(7+1)*(3+1)
192
K3[i] shows the number of different choice n_1, n_2, n_3 such that 0 <= n_k <= m_k and Σ n_k = i. Letting * mean convolution when applied to two of these lists. Then pulse(m_2)*pulse(m_3) is gives the distribution of sums of n_2 and n_3:
>>> R23 = stepconv(pulse(7),3)
>>> R23
[1, 2, 3, 4, 4, 4, 4, 4, 3, 2, 1]
>>> len(R23)
11
Every value from 0 to T=10 is (barely) possible, so any choice is possible for the first number and there are R23[T-n_1] possible triplets adding to T=10 that start with N1. So, once you've found that there are 18 possible sums adding to 10, generate a random number S = randint(18) and count down through the R23[T:T-m_1-1:-1] array:
>>> R23[10:10-5-1:-1]
[1, 2, 3, 4, 4, 4]
>>> sum(R23[10:10-5-1:-1])
18
Note the sum of that list is the total computed in K3[10] above. A sanity check. Anyway, if S==9 was the random choice, then find how many leading terms of that array can be summed without exceeding S. That's the value of n_1. In this case 1+2+3 <= S but 1+2+3+4 > S, so n_1 is 3.
As described above, you can then reduce the problem to find n_2. The final number (n_3 in this example) will be uniquely determined.
这篇关于创建受约束的随机数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:创建受约束的随机数?
基础教程推荐
- Dask.array.套用_沿_轴:由于额外的元素([1]),使用dask.array的每一行作为另一个函数的输入失败 2022-01-01
- 在 Python 中,如果我在一个“with"中返回.块,文件还会关闭吗? 2022-01-01
- 如何在海运重新绘制中自定义标题和y标签 2022-01-01
- 何时使用 os.name、sys.platform 或 platform.system? 2022-01-01
- Python kivy 入口点 inflateRest2 无法定位 libpng16-16.dll 2022-01-01
- 如何让 python 脚本监听来自另一个脚本的输入 2022-01-01
- 用于分类数据的跳跃记号标签 2022-01-01
- 线程时出现 msgbox 错误,GUI 块 2022-01-01
- 使用PyInstaller后在Windows中打开可执行文件时出错 2022-01-01
- 筛选NumPy数组 2022-01-01