I have points which include the probability on the y-axis and values on the x-axis, like:
p1 =
[[0.0, 0.0001430560406790707],
[10.0, 6.2797052001508247e-13],
[15.0, 4.8114669550502021e-06],
[20.0, 0.0007443231772534647],
[25.0, 0.00061070912573869406],
[30.0, 0.48116582167944905],
[35.0, 0.24698643991977953],
[40.0, 0.016407283121225951],
[45.0, 0.2557158314329116],
[50.0, 1.1252231121357235e-05],
[55.0, 0.064666668633158647],
[60.0, 1.7631447655837744e-17],
[65.0, 1.1294722466816786e-14],
[70.0, 2.9419020411134367e-16],
[75.0, 3.0887653014525822e-17],
[80.0, 4.4973693062706866e-17],
[85.0, 9.0975358174005147e-15],
[90.0, 1.0758266454985257e-10],
[95.0, 7.2923752473657924e-08],
[100.0, 1.8065366882584036e-08]]
p2 =
[[0.0, 4.1652247577331996e-06],
[10.0, 1.2212829713673957e-06],
[15.0, 6.5906857192417344e-08],
[20.0, 0.00016745946587138236],
[25.0, 0.0054431111796765554],
[30.0, 0.0067575214586160616],
[35.0, 0.00011856110316632124],
[40.0, 0.00032181662132509944],
[45.0, 0.001397981055516994],
[50.0, 0.0027058954834684062],
[55.0, 2.553142406703067e-06],
[60.0, 1.1514033594755017e-08],
[65.0, 0.21961568282994792],
[70.0, 2.4658349829099807e-08],
[75.0, 0.0022850986575076743],
[80.0, 3.5603047823624507e-06],
[85.0, 0.99406392082894734],
[90.0, 0.24399923235645221],
[95.0, 0.0013470125217945798],
[100.0, 0.042582366972883985]]
Now I want to generate a probability distribution from the points, where the x-axis values are (0,10,15,20,...,100) and the y-axis values contain the probabilities (0.00014,....)
When using the plt.plot
fuction I get:
plt.plot([item[0] for item in p1],[item[1] for item in p1])
And for p2:
plt.plot([item[0] for item in p2],[item[1] for item in p2])
I want to get a more smooth visualization, like a probability distribution:
And if a probability distribution is not possible, then a smoothing spline:
Scipy's gaussian_kde
is often used to smoothly approximate a probability distribution. It sums a gaussian kernel for each input point. Usually individual measurements are used as inputs, but the weights parameter allows working with binned data. The function is normalized to have its integral equal to one.
This approach assumes the values of p1 and p2 are meant as a mean for the segment around each x-value, similar to a histogram. I.e. a step function where the x-values identify the end of each step.
from matplotlib import pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
p1 = np.array([[0.0, 0.0001430560406790707],
[10.0, 6.2797052001508247e-13],
[15.0, 4.8114669550502021e-06],
[20.0, 0.0007443231772534647],
[25.0, 0.00061070912573869406],
[30.0, 0.48116582167944905],
[35.0, 0.24698643991977953],
[40.0, 0.016407283121225951],
[45.0, 0.2557158314329116],
[50.0, 1.1252231121357235e-05],
[55.0, 0.064666668633158647],
[60.0, 1.7631447655837744e-17],
[65.0, 1.1294722466816786e-14],
[70.0, 2.9419020411134367e-16],
[75.0, 3.0887653014525822e-17],
[80.0, 4.4973693062706866e-17],
[85.0, 9.0975358174005147e-15],
[90.0, 1.0758266454985257e-10],
[95.0, 7.2923752473657924e-08],
[100.0, 1.8065366882584036e-08]])
p2 = np.array([[0.0, 4.1652247577331996e-06],
[10.0, 1.2212829713673957e-06],
[15.0, 6.5906857192417344e-08],
[20.0, 0.00016745946587138236],
[25.0, 0.0054431111796765554],
[30.0, 0.0067575214586160616],
[35.0, 0.00011856110316632124],
[40.0, 0.00032181662132509944],
[45.0, 0.001397981055516994],
[50.0, 0.0027058954834684062],
[55.0, 2.553142406703067e-06],
[60.0, 1.1514033594755017e-08],
[65.0, 0.21961568282994792],
[70.0, 2.4658349829099807e-08],
[75.0, 0.0022850986575076743],
[80.0, 3.5603047823624507e-06],
[85.0, 0.99406392082894734],
[90.0, 0.24399923235645221],
[95.0, 0.0013470125217945798],
[100.0, 0.042582366972883985]])
x = np.linspace(0, 100, 1000)
fig, axes = plt.subplots(ncols=2)
for ax, p in zip(axes, [p1, p2]):
p[0, 0] = 5.0 # let each x-value be the end of a segment
ax.step(p[:,0], p[:,1], color='dodgerblue', lw=1, ls=':', where='pre')
ax2 = ax.twinx()
kde = gaussian_kde(p[:,0]-2.5, bw_method=.25, weights=p[:,1])
ax2.plot(x, kde(x), color='crimson')
plt.show()