Search code examples
pythonmatplotlibnormal-distribution

Scaling a normal distribution in Python


I want to plot a histogram for a normal distribution and also plot the corresponding normal distribution over it. There are several examples available online regarding normal distributions with y-axis normalized with density=True. In my example, I am trying to form the normal distribution curve without the density type normalization. Perhaps, this could be a mathematical question implicitly but I could not figure out how to "un-normalize" the distribution curve. Following is my code:

import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt

mu = 1e-3
std = 1.0e-4
nsize = 10000
ymax = 5000

# Generate some data for this demonstration.
data = norm.rvs(mu, std, size=nsize)

# Plot the histogram.
plt.hist(data, bins=20, color='b', edgecolor='black')

# Plot the PDF.
xmin, xmax = [0.5e-3, 1.5e-3] #plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mu, std)                      # something to do with this line
plt.plot(x, p, 'k', linewidth=2)
plt.axvline(mu, linestyle='dashed', color='black')
plt.ylim([0, ymax])

This produces the following plot.enter image description here

As can be seen, the area under the histogram will be equal to 10000 (nsize) which is the number of data points. However, it is not so with the "distribution curve". How to obtain the curve match with the histogram?


Solution

  • It looks like plt returns hist that totals to nsize. So we can just scale p:

    # Plot the histogram.
    hist, bins, _ = plt.hist(data, bins=20, color='b', edgecolor='black')
    
    # Plot the PDF.
    xmin, xmax = [0.5e-3, 1.5e-3] #plt.xlim()
    
    # changes here
    p = norm.pdf(bins, mu, std)           
    plt.plot(bins, p/p.sum() * nsize , 'r', linewidth=2)
    

    Output:

    enter image description here