Search code examples
pythonnumpyscipymontecarlo

Random Number from Histogram


Suppose I create a histogram using scipy/numpy, so I have two arrays: one for the bin counts, and one for the bin edges. If I use the histogram to represent a probability distribution function, how can I efficiently generate random numbers from that distribution?


Solution

  • It's probably what np.random.choice does in @Ophion's answer, but you can construct a normalized cumulative density function, then choose based on a uniform random number:

    from __future__ import division
    import numpy as np
    import matplotlib.pyplot as plt
    
    data = np.random.normal(size=1000)
    hist, bins = np.histogram(data, bins=50)
    
    bin_midpoints = bins[:-1] + np.diff(bins)/2
    cdf = np.cumsum(hist)
    cdf = cdf / cdf[-1]
    values = np.random.rand(10000)
    value_bins = np.searchsorted(cdf, values)
    random_from_cdf = bin_midpoints[value_bins]
    
    plt.subplot(121)
    plt.hist(data, 50)
    plt.subplot(122)
    plt.hist(random_from_cdf, 50)
    plt.show()
    

    enter image description here


    A 2D case can be done as follows:

    data = np.column_stack((np.random.normal(scale=10, size=1000),
                            np.random.normal(scale=20, size=1000)))
    x, y = data.T                        
    hist, x_bins, y_bins = np.histogram2d(x, y, bins=(50, 50))
    x_bin_midpoints = x_bins[:-1] + np.diff(x_bins)/2
    y_bin_midpoints = y_bins[:-1] + np.diff(y_bins)/2
    cdf = np.cumsum(hist.ravel())
    cdf = cdf / cdf[-1]
    
    values = np.random.rand(10000)
    value_bins = np.searchsorted(cdf, values)
    x_idx, y_idx = np.unravel_index(value_bins,
                                    (len(x_bin_midpoints),
                                     len(y_bin_midpoints)))
    random_from_cdf = np.column_stack((x_bin_midpoints[x_idx],
                                       y_bin_midpoints[y_idx]))
    new_x, new_y = random_from_cdf.T
    
    plt.subplot(121, aspect='equal')
    plt.hist2d(x, y, bins=(50, 50))
    plt.subplot(122, aspect='equal')
    plt.hist2d(new_x, new_y, bins=(50, 50))
    plt.show()
    

    enter image description here