Search code examples
pythonintegralprobability-densityweibull

Riemann sum of a probability density


I am trying to find the probability of an event of a random variable past a specific value, i.e. pr(x>a), where a is some constant, typically much higher than the average of x, and x is not of any standard Gaussian distribution. So I wanted to fit some other probability density function, and take the integral of the pdf of x from a to inf. As this is a problem of modelling the spikes, I considered this an Extreme Value analysis problem, and found that the Weibull distribution might be appropriate.

Regarding extreme value distributions, the Weibull distribution has a very "not-easy-to-implement" integral, and I therefore figured I could just get the pdf from Scipy, and do a Riemann-sum. I also thought that I could as well simply evaluate the kernel density, get the pdf, and do the same with the Riemann sum, to approximate the integral.

I found a Q here on Stack which provided a neat method for doing Riemann sums in Python, and I adapted that code to fit my problem. But when I evaluate the integral I get weird numbers, indicating that something is either wrong with the KDE, or the Riemann sum-function.

Two scenarios, the first with the Weibull, in accordance with the Scipy documentation:

x = theData
x_grid = np.linspace(0,np.max(x),len(x))

p = ss.weibull_min.fit(x[x!=0], floc=0)
pd = ss.weibull_min.pdf(x_grid,p[0], p[1], p[2])

which looks like this:

enter image description here

and then also tried the KDE method as follows

pd = ss.gaussian_kde(x).pdf(x_grid)

which I subsequently run through the following function:

def riemannSum(a, b, n):
    dx = (b - a) / n
    s = 0.0
    x = a
    for i in range(n): 
        s += pd[x]
        x += dx
    return s * dx          
print(riemannSum(950.0, 1612.0, 10000))
print(riemannSum(0.0, 1612.0, 100000))

In the case of the Weibull, it gives me

>> 0.272502150549
>> 18.2860384829

and in the case of the KDE, I get

>> 0.448450460469
>> 18.2796021034

This is obviously wrong. Taking the integral of the entire thing should give me 1, and 18.2+ is quite far off.

Am I wrong in my assumptions of what I can do with these density functions? Or have I made some mistake in the Riemann sum function


Solution

  • the Weibull distribution has a very "not-easy-to-implement" integral

    Huh?!

    Weibull distribution has very well defined CDF, so implementing integral is pretty much one-liner (ok, make it two for clarity)

    def WeibullCDF(x, lmbd, k):
        q = pow(x/lmbd, k)
        return 1.0 - exp(-q)
    

    And, of course, there is ss.weibull_min.cdf(x_grid,p[0], p[1], p[2]) if you want to pick from standard library