Search code examples
pythonnormalizationnormal-distribution

Converting gaussian to histogram


I'm running a model of particles, and I want to have initial conditions for the particle locations mimicking a gaussian distribution. If I have N number of particles on 1D grid from -10 to 10, I want them to be distributed on the grid according to a gaussian with a known mean and standard deviation. It's basically creating a histogram where each bin width is 1 (the x-axis of locations resolution is 1), and the frequency of each bin should be how many particles are in it, which should all add up to N.

My strategy was to plot a gaussian function on the x-axis grid, and then just approximate the value of each point for the number of particles:

def gaussian(x, mu, sig):
    return 1./(np.sqrt(2.*np.pi)*sig)*np.exp(-np.power((x - mu)/sig, 2.)/2)

mean = 0
sigma = 1

x_values = np.arange(-10, 10, 1)
y = gaussian(x_values, mean, sigma)

However, I have normalization issues (the sum doesn't add up to N), and the number of particles in each point should be an integer (I thought about converting the y array to integers but again, because of the normalization issue I get a flat line).

Usually, the problem is fitting a gaussian to histogram, but in my case, I need to do the reverse - and I couldn't find a solution for it yet. I will appreciate any help! Thank you!!!


Solution

  • You can use numpy.random.normal to sample this distribution. You can get N points inside range (-10, 10) that follows Gaussian distribution with the following code.

    import numpy as np
    import matplotlib.pyplot as plt
    
    N = 10000
    mean = 5
    sigma = 3
    
    bin_edges = np.arange(-10, 11, 1)
    x_values = (bin_edges[1:] + bin_edges[:-1]) / 2
    
    
    points = np.random.normal(mean, sigma, N * 10)
    mask = np.logical_and(points < 10, points > -10)
    points = points[mask]  # drop points outside range
    points = points[:N]  # only use the first N points
    y, _ = np.histogram(points, bins=bin_edges)
    
    
    plt.scatter(x_values, y)
    plt.show()
    

    The idea is to generate a lot of random numbers (10 N in the code), and ignores the points outside your desired range.