Search code examples
pythonmatplotlibprobabilitycontourkernel-density

Python plotting percentile contour lines of a probability distribution


Given a probability distribution with unknown functional form (example below), I like to plot "percentile-based" contour lines, i.e.,those that correspond to regions with an integral of 10%, 20%, ..., 90% etc.

## example of an "arbitrary" probability distribution ##
from matplotlib.mlab import bivariate_normal
import matplotlib.pyplot as plt
import numpy as np

X, Y = np.mgrid[-3:3:100j, -3:3:100j]
z1 = bivariate_normal(X, Y, .5, .5, 0., 0.)
z2 = bivariate_normal(X, Y, .4, .4, .5, .5)
z3 = bivariate_normal(X, Y, .6, .2, -1.5, 0.)
z = z1+z2+z3
plt.imshow(np.reshape(z.T, (100,-1)), origin='lower', extent=[-3,3,-3,3])
plt.show()

enter image description here I've looked into multiple approaches, from using the default contour function in matplotlib, methods involving stats.gaussian_kde in scipy, and even perhaps generating random point samples from the distribution and estimating a kernel afterwards. None of them appears to provide the solution.


Solution

  • Look at the integral of p(x) inside the contour p(x) ≥ t and solve for the desired value of t:

    import matplotlib
    from matplotlib.mlab import bivariate_normal
    import matplotlib.pyplot as plt
    import numpy as np
    
    X, Y = np.mgrid[-3:3:100j, -3:3:100j]
    z1 = bivariate_normal(X, Y, .5, .5, 0., 0.)
    z2 = bivariate_normal(X, Y, .4, .4, .5, .5)
    z3 = bivariate_normal(X, Y, .6, .2, -1.5, 0.)
    z = z1 + z2 + z3
    z = z / z.sum()
    
    n = 1000
    t = np.linspace(0, z.max(), n)
    integral = ((z >= t[:, None, None]) * z).sum(axis=(1,2))
    
    from scipy import interpolate
    f = interpolate.interp1d(integral, t)
    t_contours = f(np.array([0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]))
    plt.imshow(z.T, origin='lower', extent=[-3,3,-3,3], cmap="gray")
    plt.contour(z.T, t_contours, extent=[-3,3,-3,3])
    plt.show()
    

    enter image description here