I have manually fitted a lognormal distribution to my data:
from scipy.stats import lognorm
sigma = 0.15
mu = 2
x_fit = np.linspace(x.min(), x.max(), 100)
y_fit = lognorm.pdf(x_fit, sigma, scale=np.exp(mu))
fig, ax = plt.subplots()
plt.plot(x_fit, y_fit, color='red')
Problem is that scipy does not converge to sigma = 0.15, but it only converges to mu ~ 2. Plus, I am getting some inf as well... Note: This is occurring even if I use [2, 0.15] as starting guess:
from scipy.optimize import curve_fit
y, x = np.histogram(z_data, bins=100)
x = (x[1:]+x[:-1])/2 # bin centers
def fit_lognormal(x, mu, sigma):
return lognorm.pdf(x, sigma, scale=np.exp(mu))
p0 = [2, 0.15]
params = curve_fit(fit_lognormal, x, y, p0=p0)
params
This returns:
(array([1.97713319e+00, 6.98238900e-05]),
array([[inf, inf],
[inf, inf]]))
You forgot to scale your histogram to have an unitary area using the switch density
:
np.histogram(x, density=1.)
Lets say you dataset is the following:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats, optimize
np.random.seed(123456)
sigma = 0.15
mu = 2
law = stats.lognorm(s=sigma, scale=np.exp(mu))
N = 30_000
x = law.rvs(size=N)
density, bins = np.histogram(x, density=1., bins=30) # Mind the density=1.
centers = (bins[1:] + bins[:-1]) / 2.
You can fit the parameters in three easy ways:
Natively with stats
:
stats.lognorm.fit(x)
(0.151297129052073, 0.053788977866467386, 7.335249847775119)
Where loc ~ 0
and scale ~ np.exp(2)
.
Using curve_fit
to fit the PDF:
def model(x, mu, sigma):
return stats.lognorm.pdf(x, s=sigma, scale=np.exp(mu))
popt, pcov = optimize.curve_fit(model, centers, density)
# (array([2.00059957, 0.15158351]),
# array([[8.87310596e-07, 4.50003479e-08],
# [4.50003479e-08, 5.93772296e-07]]))
Using MLE and minimize
:
def factory(x):
def wrapped(p):
return - np.sum(stats.lognorm.logpdf(x, s=p[1], scale=np.exp(p[0])))
return wrapped
likelihood = factory(x)
sol = optimize.minimize(likelihood, x0=[1, 1])
# fun: 45693.61136284961
# hess_inv: array([[ 3.38213795e-08, -1.68343490e-08],
# [-1.68343490e-08, 5.54538464e-07]])
# jac: array([-0.00048828, 0.00097656])
# message: 'Desired error not necessarily achieved due to precision loss.'
# nfev: 66
# nit: 17
# njev: 22
# status: 2
# success: False
# x: array([2.00008072, 0.15018327])