离散数据拟合:负二项式、泊松、几何分布

Fitting For Discrete Data: Negative Binomial, Poisson, Geometric Distribution(离散数据拟合:负二项式、泊松、几何分布)

本文介绍了离散数据拟合:负二项式、泊松、几何分布的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 scipy 中,不支持使用数据拟合离散分布.我知道有很多关于这个的主题.

In scipy there is no support for fitting discrete distributions using data. I know there are a lot of subject about this.

例如,如果我有一个如下所示的数组:

For example if i have an array like below:

x = [2,3,4,5,6,7,0,1,1,0,1,8,10,9,1,1,1,0,0]

x = [2,3,4,5,6,7,0,1,1,0,1,8,10,9,1,1,1,0,0]

我无法申请此数组:

from scipy.stats import nbinom
param = nbinom.fit(x)

但是我想问你最新的,有没有什么办法适合这三个离散分布,然后选择最适合离散数据集的?

But i would like to ask you up to date, is there any way to fit for these three discrete distributions and then choose the best fit for the discrete dataset?

推荐答案

您可以使用 方法时刻以适应任何特定分布.

You can use Method of Moments to fit any particular distribution.

基本思想:得到经验的一阶、二阶等矩,然后从这些矩推导出分布参数.

Basic idea: get empirical first, second, etc. moments, then derive distribution parameters from these moments.

因此,在所有这些情况下,我们只需要两个时刻.让我们得到它们:

So, in all these cases we only need two moments. Let's get them:

import pandas as pd
# for other distributions, you'll need to implement PMF
from scipy.stats import nbinom, poisson, geom

x = pd.Series(x)
mean = x.mean()
var = x.var()
likelihoods = {}  # we'll use it later

注意:我使用了熊猫而不是 numpy.那是因为 numpy 的 var()std() 不适用 Bessel 的修正,而大熊猫的修正.如果您有 100 多个样本,应该不会有太大差异,但对于较小的样本,这可能很重要.

Note: I used pandas instead of numpy. That is because numpy's var() and std() don't apply Bessel's correction, while pandas' do. If you have 100+ samples, there shouldn't be much difference, but on smaller samples it could be important.

现在,让我们获取这些分布的参数.负二项式有两个参数:p、r.让我们估计它们并计算数据集的可能性:

Now, let's get parameters for these distributions. Negative binomial has two parameters: p, r. Let's estimate them and calculate likelihood of the dataset:

# From the wikipedia page, we have:
# mean = pr / (1-p)
# var = pr / (1-p)**2
# without wiki, you could use MGF to get moments; too long to explain here
# Solving for p and r, we get:

p = 1 - mean / var  # TODO: check for zero variance and limit p by [0, 1]
r = (1-p) * mean / p

UPD: Wikipedia 和 scipy 对 p 使用了不同的定义,一种将其视为成功概率,另一种将其视为失败概率.因此,为了与 scipy 概念一致,请使用:

UPD: Wikipedia and scipy are using different definitions of p, one treating it as probability of success and another as probability of failure. So, to be consistent with scipy notion, use:

p = mean / var
r = p * mean / (1-p)

UPD 结束

计算可能性:

likelihoods['nbinom'] = x.map(lambda val: nbinom.pmf(val, r, p)).prod()

Poisson 也是一样,只有一个参数:

Same for Poisson, there is only one parameter:

# from Wikipedia,
# mean = variance = lambda. Nothing to solve here
lambda_ = mean
likelihoods['poisson'] = x.map(lambda val: poisson.pmf(val, lambda_)).prod()

对于几何分布也是如此:

# mean = 1 / p  # this form fits the scipy definition
p = 1 / mean

likelihoods['geometric'] = x.map(lambda val: geom.pmf(val, p)).prod()

最后,让我们找到最合适的:

Finally, let's get the best fit:

best_fit = max(likelihoods, key=lambda x: likelihoods[x])
print("Best fit:", best_fit)
print("Likelihood:", likelihoods[best_fit])

如果您有任何问题,请告诉我

Let me know if you have any questions

这篇关于离散数据拟合:负二项式、泊松、几何分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:离散数据拟合:负二项式、泊松、几何分布

基础教程推荐