python scipy curve-fitting distribution scipy-optimize

scipy lognorm does not converge to params

I have manually fitted a lognormal distribution to my data:

from scipy.stats import lognorm

sigma = 0.15
mu = 2
x_fit = np.linspace(x.min(), x.max(), 100)
y_fit = lognorm.pdf(x_fit, sigma, scale=np.exp(mu))
fig, ax = plt.subplots()
plt.plot(x_fit, y_fit, color='red')

Problem is that scipy does not converge to sigma = 0.15, but it only converges to mu ~ 2. Plus, I am getting some inf as well... Note: This is occurring even if I use [2, 0.15] as starting guess:

from scipy.optimize import curve_fit

y, x = np.histogram(z_data, bins=100)
x = (x[1:]+x[:-1])/2 # bin centers

def fit_lognormal(x, mu, sigma):
    return lognorm.pdf(x, sigma, scale=np.exp(mu))

p0 = [2, 0.15]
params = curve_fit(fit_lognormal, x, y, p0=p0)
params

This returns:

(array([1.97713319e+00, 6.98238900e-05]),
 array([[inf, inf],
        [inf, inf]]))

Solution

You forgot to scale your histogram to have an unitary area using the switch density:

np.histogram(x, density=1.)

Lets say you dataset is the following:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats, optimize

np.random.seed(123456)
sigma = 0.15
mu = 2

law = stats.lognorm(s=sigma, scale=np.exp(mu))
N = 30_000
x = law.rvs(size=N)

density, bins = np.histogram(x, density=1., bins=30)  # Mind the density=1.
centers = (bins[1:] + bins[:-1]) / 2.

You can fit the parameters in three easy ways:

Natively with stats:

stats.lognorm.fit(x)
(0.151297129052073, 0.053788977866467386, 7.335249847775119)

Where loc ~ 0 and scale ~ np.exp(2).

Using curve_fit to fit the PDF:

def model(x, mu, sigma):
    return stats.lognorm.pdf(x, s=sigma, scale=np.exp(mu))

popt, pcov = optimize.curve_fit(model, centers, density)
# (array([2.00059957, 0.15158351]),
#  array([[8.87310596e-07, 4.50003479e-08],
#        [4.50003479e-08, 5.93772296e-07]]))

Using MLE and minimize:

def factory(x):
    def wrapped(p):
        return - np.sum(stats.lognorm.logpdf(x, s=p[1], scale=np.exp(p[0])))
    return wrapped

likelihood = factory(x)

sol = optimize.minimize(likelihood, x0=[1, 1])
#     fun: 45693.61136284961
#  hess_inv: array([[ 3.38213795e-08, -1.68343490e-08],
#        [-1.68343490e-08,  5.54538464e-07]])
#       jac: array([-0.00048828,  0.00097656])
#   message: 'Desired error not necessarily achieved due to precision loss.'
#      nfev: 66
#       nit: 17
#      njev: 22
#    status: 2
#   success: False
#         x: array([2.00008072, 0.15018327])