Search code examples
pythonweibull

Why do i have this gap in my plots during a weibull fitting with a 3 parameter pdf function?


I want to fit data to a 3 parameter weibull distribution, but on my plot i have always a gap between the original data and the fits. What do i am wrong? And why do i get such a worst fit when i use these initial_params = [1,1,1]?

Here is my code:

import numpy as np  
from scipy.optimize import minimize  
from scipy.special import gamma  
import matplotlib.pyplot as plt  
from scipy.stats import weibull_min

def weibull_log_likelihood(params, data):
    shape, scale, loc = params
    log_likelihood = -np.sum(weibull_min.logpdf(data, shape, loc=loc, scale=scale))
    return -log_likelihood  

def estimate_weibull_params(data):
    initial_params = [shape, scale, loc] 
    bounds=[(1, None), (1, None), (1, None)]
    result = minimize(weibull_log_likelihood, initial_params, args=(data,), method='nelder-mead', bounds=bounds) 
    return result.x  

def weibull_pdf(x, shape, scale, loc):
    return (shape / scale) * ((x - loc) / scale) ** (shape - 1) * np.exp(-((x - loc) / scale) ** shape)

shape = 7.5
scale = 150
loc = 350
size = 100
data = weibull_min.rvs(shape, loc=loc, scale=scale, size=size)
   
estimated_params = estimate_weibull_params(data)
shape, scale, loc = estimated_params

print(f"Estimated Parameters: Shape = {shape}, Scale = {scale}, Location = {loc}")

x = np.arange(1000)
pdf = weibull_pdf(x, shape, scale, loc)  
plt.hist(data, bins=20, density=True, alpha=0.6, color='g')  
plt.plot(x, pdf, 'r-', lw=2)  
plt.xlim(0, 1000)  
plt.show()  

enter image description here


Solution

  • The Weibull distribution is a very extreme distribution. It involves powers of powers, which rapidly grow out of control.

    If you change your log_likelihood function to what is effectively log(-log(pdf)) then you may get better results:

    def weibull_log_likelihood(params, data):
        shape, scale, loc = params
        log_minus_log_likelihood = np.sum(np.log( -weibull_min.logpdf(data, shape, loc=loc, scale=scale)))
        return log_minus_log_likelihood
    

    I recommend that you also improve the initial guess to something more in line with the data:

    initial_params = [ 1, np.max( data ) - np.min( data ), np.min( data ) ]
    

    Note that I think this is not a particularly good way of finding parameter loc, because the Weibull distribution involves a term ((x-loc)/scale)^shape, and a free guess at loc could allow you to take a negative number to a non-integer power.

    Estimated Parameters: Shape = 5.383509295480495, Scale = 98.8928064025068, Location = 395.6676730395027
    

    (Obviously, given your random data and the relatively small sample size you will get different numbers each time for these.)

    enter image description here