I want to fit data to a 3 parameter weibull distribution, but on my plot i have always a gap between the original data and the fits. What do i am wrong? And why do i get such a worst fit when i use these initial_params = [1,1,1]?
Here is my code:
import numpy as np
from scipy.optimize import minimize
from scipy.special import gamma
import matplotlib.pyplot as plt
from scipy.stats import weibull_min
def weibull_log_likelihood(params, data):
shape, scale, loc = params
log_likelihood = -np.sum(weibull_min.logpdf(data, shape, loc=loc, scale=scale))
return -log_likelihood
def estimate_weibull_params(data):
initial_params = [shape, scale, loc]
bounds=[(1, None), (1, None), (1, None)]
result = minimize(weibull_log_likelihood, initial_params, args=(data,), method='nelder-mead', bounds=bounds)
return result.x
def weibull_pdf(x, shape, scale, loc):
return (shape / scale) * ((x - loc) / scale) ** (shape - 1) * np.exp(-((x - loc) / scale) ** shape)
shape = 7.5
scale = 150
loc = 350
size = 100
data = weibull_min.rvs(shape, loc=loc, scale=scale, size=size)
estimated_params = estimate_weibull_params(data)
shape, scale, loc = estimated_params
print(f"Estimated Parameters: Shape = {shape}, Scale = {scale}, Location = {loc}")
x = np.arange(1000)
pdf = weibull_pdf(x, shape, scale, loc)
plt.hist(data, bins=20, density=True, alpha=0.6, color='g')
plt.plot(x, pdf, 'r-', lw=2)
plt.xlim(0, 1000)
plt.show()
The Weibull distribution is a very extreme distribution. It involves powers of powers, which rapidly grow out of control.
If you change your log_likelihood function to what is effectively log(-log(pdf)) then you may get better results:
def weibull_log_likelihood(params, data):
shape, scale, loc = params
log_minus_log_likelihood = np.sum(np.log( -weibull_min.logpdf(data, shape, loc=loc, scale=scale)))
return log_minus_log_likelihood
I recommend that you also improve the initial guess to something more in line with the data:
initial_params = [ 1, np.max( data ) - np.min( data ), np.min( data ) ]
Note that I think this is not a particularly good way of finding parameter loc
, because the Weibull distribution involves a term ((x-loc)/scale)^shape,
and a free guess at loc could allow you to take a negative number to a non-integer power.
Estimated Parameters: Shape = 5.383509295480495, Scale = 98.8928064025068, Location = 395.6676730395027
(Obviously, given your random data and the relatively small sample size you will get different numbers each time for these.)