I have some data that is normally distributed and to which I have fitted a pdf. However, I want to get the probability of the likelihood of a given value from the dataset occurring. From my understanding, this is the area of the bin under the pdf for where the value of x lies. Is there a numpy or scipy.stats function to generate this? I have looked but either Im not seeing it or my lack of understanding is holding me back. So far I have:
import h5py
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.mlab as mlab
import scipy.stats as stats
import numpy
import math
a = 'data.h5'
f = h5py.File(a,'r')
dset = f['/DATA/DATA/']
values = dset[...,0]
I can then generate a histogram of this data and fit a pdf to it:
n, bins, patches = plt.hist(values, 50, normed=1)
mu = np.mean(values)
sigma = np.std(values)
plt.plot(bins, mlab.normpdf(bins, mu, sigma))
plt.show()
And I can retrieve the f(x) for a given value of x (in this case 0.65)
print(stats.norm.pdf(0.65, np.mean(mb1), np.std(mb1)))
Can someone help me generate my probability from this?
I have attached the outputted histogram with pdf.
What you would ideally want to do is integrate over the probability density function in the range of the event you want the probability for. Here's some code:
import numpy as np
import scipy.stats as ss
a = ss.norm.rvs(4, 2, 40)
hist(a, normed=True)
xs = np.linspace(0, 10, 30)
plot(xs, ss.norm.pdf(xs, 4, 2), label='pdf')
plot(xs, ss.norm.cdf(xs, 4, 2), label='cdf')
Which produces a normal distribution centered at the value of 4 with a sigma value of 2. The figure below traces the pdf with the red line and cdf with the purple line. The cdf is simply the integral of the pdf from negative infinity to the value at which it is calculated. Thus to get the integral of the pdf over a range, you simply have to subtract the cdf values at the two end points of the range.
Now you can ask what is the probability of seeing a value between -100 and 4?
print ss.norm.cdf(4, 4, 2) - ss.norm.cdf(-100, 4, 2)
Which will result in the expected answer of 0.5
, which corresponds to (pretty much) half of the entire distribution. So in your case, you may be interested in the probability of seeing a value between 0.60 and 0.70:
print ss.norm.cdf(0.70, 4, 2) - ss.norm.cdf(0.60, 4, 2)
Which should result in the small probability of:
0.00490600527511
I should note that the 'probability' of 0.65 itself is meaningless since you have a continuous probability distribution and the exact value of 0.65 is an infinitesimally small part of it so its probability is 0.