Search code examples
pythonscipystatisticsprobability-density

Scipy normal PDF summing to values larger than 1


I'm working with scipy.stats.norm PDFs. I'm wondering if I'm doing something wrong (maybe yes), since the sum of the PDF over the normal distribution is returning values larger than one, and, by definition, a probability density function should sum up to 1. I would be ok if it would sum up to ~0.99 (since I'm not summing all values in the interval) but 20 isn't much acceptable. A minimum example is shown below

from scipy.stats import norm
lower_bound = norm.ppf(0.01)
upper_bound = norm.ppf(0.99)
N = 100
x = np.linspace(lower_bound, upper_bound, 100)
P = norm.pdf(x) 
print(np.sum(P))

Solution

  • Your definition should be tweaked slightly: by definition, the integral of the probability density function over the entire space should be equal to 1. If you want to approximate this integral using a Riemann sum, you should multiply the values by the spacing between your x values. You will then see that the result is much closer to 1:

    >>> import numpy as np
    >>> from scipy.stats import norm
    >>> lower_bound = norm.ppf(0.01)
    >>> upper_bound = norm.ppf(0.99)
    >>> N = 100
    >>> x, step = np.linspace(lower_bound, upper_bound, N, retstep=True)
    >>> P = norm.pdf(x)
    >>> print(np.sum(P * step))
    0.9812297466603901
    

    The exact result (which you can more closely approximate by increasing N) should be 0.98, the difference between the q values passed to norm.ppf.