I am trying to find the probability of an event of a random variable past a specific value, i.e. pr(x>a), where a is some constant, typically much higher than the average of x, and x is not of any standard Gaussian distribution. So I wanted to fit some other probability density function, and take the integral of the pdf of x from a to inf. As this is a problem of modelling the spikes, I considered this an Extreme Value analysis problem, and found that the Weibull distribution might be appropriate.
Regarding extreme value distributions, the Weibull distribution has a very "not-easy-to-implement" integral, and I therefore figured I could just get the pdf from Scipy, and do a Riemann-sum. I also thought that I could as well simply evaluate the kernel density, get the pdf, and do the same with the Riemann sum, to approximate the integral.
I found a Q here on Stack which provided a neat method for doing Riemann sums in Python, and I adapted that code to fit my problem. But when I evaluate the integral I get weird numbers, indicating that something is either wrong with the KDE, or the Riemann sum-function.
Two scenarios, the first with the Weibull, in accordance with the Scipy documentation:
x = theData
x_grid = np.linspace(0,np.max(x),len(x))
p = ss.weibull_min.fit(x[x!=0], floc=0)
pd = ss.weibull_min.pdf(x_grid,p[0], p[1], p[2])
which looks like this:
and then also tried the KDE method as follows
pd = ss.gaussian_kde(x).pdf(x_grid)
which I subsequently run through the following function:
def riemannSum(a, b, n):
dx = (b - a) / n
s = 0.0
x = a
for i in range(n):
s += pd[x]
x += dx
return s * dx
print(riemannSum(950.0, 1612.0, 10000))
print(riemannSum(0.0, 1612.0, 100000))
In the case of the Weibull, it gives me
>> 0.272502150549
>> 18.2860384829
and in the case of the KDE, I get
>> 0.448450460469
>> 18.2796021034
This is obviously wrong. Taking the integral of the entire thing should give me 1, and 18.2+ is quite far off.
Am I wrong in my assumptions of what I can do with these density functions? Or have I made some mistake in the Riemann sum function
the Weibull distribution has a very "not-easy-to-implement" integral
Huh?!
Weibull distribution has very well defined CDF, so implementing integral is pretty much one-liner (ok, make it two for clarity)
def WeibullCDF(x, lmbd, k):
q = pow(x/lmbd, k)
return 1.0 - exp(-q)
And, of course, there is ss.weibull_min.cdf(x_grid,p[0], p[1], p[2])
if you want to pick from standard library