Fitting For Discrete Data: Negative Binomial, Poisson, Geometric Distribution(离散数据拟合:负二项式、泊松、几何分布)
问题描述
在 scipy 中,不支持使用数据拟合离散分布.我知道有很多关于这个的主题.
In scipy there is no support for fitting discrete distributions using data. I know there are a lot of subject about this.
例如,如果我有一个如下所示的数组:
For example if i have an array like below:
x = [2,3,4,5,6,7,0,1,1,0,1,8,10,9,1,1,1,0,0]
x = [2,3,4,5,6,7,0,1,1,0,1,8,10,9,1,1,1,0,0]
我无法申请此数组:
from scipy.stats import nbinom
param = nbinom.fit(x)
但是我想问你最新的,有没有什么办法适合这三个离散分布,然后选择最适合离散数据集的?
But i would like to ask you up to date, is there any way to fit for these three discrete distributions and then choose the best fit for the discrete dataset?
推荐答案
您可以使用 方法时刻以适应任何特定分布.
You can use Method of Moments to fit any particular distribution.
基本思想:得到经验的一阶、二阶等矩,然后从这些矩推导出分布参数.
Basic idea: get empirical first, second, etc. moments, then derive distribution parameters from these moments.
因此,在所有这些情况下,我们只需要两个时刻.让我们得到它们:
So, in all these cases we only need two moments. Let's get them:
import pandas as pd
# for other distributions, you'll need to implement PMF
from scipy.stats import nbinom, poisson, geom
x = pd.Series(x)
mean = x.mean()
var = x.var()
likelihoods = {} # we'll use it later
注意:我使用了熊猫而不是 numpy.那是因为 numpy 的 var()
和 std()
不适用 Bessel 的修正,而大熊猫的修正.如果您有 100 多个样本,应该不会有太大差异,但对于较小的样本,这可能很重要.
Note: I used pandas instead of numpy. That is because numpy's var()
and std()
don't apply Bessel's correction, while pandas' do. If you have 100+ samples, there shouldn't be much difference, but on smaller samples it could be important.
现在,让我们获取这些分布的参数.负二项式有两个参数:p、r.让我们估计它们并计算数据集的可能性:
Now, let's get parameters for these distributions. Negative binomial has two parameters: p, r. Let's estimate them and calculate likelihood of the dataset:
# From the wikipedia page, we have:
# mean = pr / (1-p)
# var = pr / (1-p)**2
# without wiki, you could use MGF to get moments; too long to explain here
# Solving for p and r, we get:
p = 1 - mean / var # TODO: check for zero variance and limit p by [0, 1]
r = (1-p) * mean / p
UPD: Wikipedia 和 scipy 对 p 使用了不同的定义,一种将其视为成功概率,另一种将其视为失败概率.因此,为了与 scipy 概念一致,请使用:
UPD: Wikipedia and scipy are using different definitions of p, one treating it as probability of success and another as probability of failure. So, to be consistent with scipy notion, use:
p = mean / var
r = p * mean / (1-p)
UPD 结束
计算可能性:
likelihoods['nbinom'] = x.map(lambda val: nbinom.pmf(val, r, p)).prod()
Poisson 也是一样,只有一个参数:
Same for Poisson, there is only one parameter:
# from Wikipedia,
# mean = variance = lambda. Nothing to solve here
lambda_ = mean
likelihoods['poisson'] = x.map(lambda val: poisson.pmf(val, lambda_)).prod()
对于几何分布也是如此:
# mean = 1 / p # this form fits the scipy definition
p = 1 / mean
likelihoods['geometric'] = x.map(lambda val: geom.pmf(val, p)).prod()
最后,让我们找到最合适的:
Finally, let's get the best fit:
best_fit = max(likelihoods, key=lambda x: likelihoods[x])
print("Best fit:", best_fit)
print("Likelihood:", likelihoods[best_fit])
如果您有任何问题,请告诉我
Let me know if you have any questions
这篇关于离散数据拟合:负二项式、泊松、几何分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:离散数据拟合:负二项式、泊松、几何分布
基础教程推荐
- Python kivy 入口点 inflateRest2 无法定位 libpng16-16.dll 2022-01-01
- 线程时出现 msgbox 错误,GUI 块 2022-01-01
- 用于分类数据的跳跃记号标签 2022-01-01
- 如何在海运重新绘制中自定义标题和y标签 2022-01-01
- 何时使用 os.name、sys.platform 或 platform.system? 2022-01-01
- Dask.array.套用_沿_轴:由于额外的元素([1]),使用dask.array的每一行作为另一个函数的输入失败 2022-01-01
- 在 Python 中,如果我在一个“with"中返回.块,文件还会关闭吗? 2022-01-01
- 筛选NumPy数组 2022-01-01
- 使用PyInstaller后在Windows中打开可执行文件时出错 2022-01-01
- 如何让 python 脚本监听来自另一个脚本的输入 2022-01-01