
Python: Generate random values from empirical distribution(Python:从经验分布中生成随机值)



在 Java 中,我通常依赖于

new_sample_data 应该从与原始数据大致相同的分布中抽取(在某种程度上,KDE 可以很好地近似原始分布).

In Java, I usually rely on the org.apache.commons.math3.random.EmpiricalDistribution class to do the following:

  • Derive a probability distribution from observed data.
  • Generate random values from this distribution.

Is there any Python library that provides the same functionality? It seems like scipy.stats.gaussian_kde.resample does something similar, but I'm not sure if it implements the same procedure as the Java type I'm familiar with.


import numpy as np
import scipy.stats
import matplotlib.pyplot as plt

# This represents the original "empirical" sample -- I fake it by
# sampling from a normal distribution
orig_sample_data = np.random.normal(size=10000)

# Generate a KDE from the empirical sample
sample_pdf = scipy.stats.gaussian_kde(orig_sample_data)

# Sample new datapoints from the KDE
new_sample_data = sample_pdf.resample(10000).T[:,0]

# Histogram of initial empirical sample
cnts, bins, p = plt.hist(orig_sample_data, label='original sample', bins=100,
                         histtype='step', linewidth=1.5, density=True)

# Histogram of datapoints sampled from KDE
plt.hist(new_sample_data, label='sample from KDE', bins=bins,
         histtype='step', linewidth=1.5, density=True)

# Visualize the kde itself
y_kde = sample_pdf(bins)
plt.plot(bins, y_kde, label='KDE')

new_sample_data should be drawn from roughly the same distribution as the original data (to the degree that the KDE is a good approximation to the original distribution).


