Search code examples
pythonscipycurve-fittingdistributionscipy-optimize

scipy lognorm does not converge to params


I have manually fitted a lognormal distribution to my data:

from scipy.stats import lognorm

sigma = 0.15
mu = 2
x_fit = np.linspace(x.min(), x.max(), 100)
y_fit = lognorm.pdf(x_fit, sigma, scale=np.exp(mu))
fig, ax = plt.subplots()
plt.plot(x_fit, y_fit, color='red')

Histo

Problem is that scipy does not converge to sigma = 0.15, but it only converges to mu ~ 2. Plus, I am getting some inf as well... Note: This is occurring even if I use [2, 0.15] as starting guess:

from scipy.optimize import curve_fit

y, x = np.histogram(z_data, bins=100)
x = (x[1:]+x[:-1])/2 # bin centers

def fit_lognormal(x, mu, sigma):
    return lognorm.pdf(x, sigma, scale=np.exp(mu))

p0 = [2, 0.15]
params = curve_fit(fit_lognormal, x, y, p0=p0)
params

This returns:

(array([1.97713319e+00, 6.98238900e-05]),
 array([[inf, inf],
        [inf, inf]]))

Solution

  • You forgot to scale your histogram to have an unitary area using the switch density:

    np.histogram(x, density=1.)
    

    Lets say you dataset is the following:

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy import stats, optimize
    
    np.random.seed(123456)
    sigma = 0.15
    mu = 2
    
    law = stats.lognorm(s=sigma, scale=np.exp(mu))
    N = 30_000
    x = law.rvs(size=N)
    
    density, bins = np.histogram(x, density=1., bins=30)  # Mind the density=1.
    centers = (bins[1:] + bins[:-1]) / 2.
    

    You can fit the parameters in three easy ways:

    Natively with stats:

    stats.lognorm.fit(x)
    (0.151297129052073, 0.053788977866467386, 7.335249847775119)
    

    Where loc ~ 0 and scale ~ np.exp(2).

    Using curve_fit to fit the PDF:

    def model(x, mu, sigma):
        return stats.lognorm.pdf(x, s=sigma, scale=np.exp(mu))
    
    popt, pcov = optimize.curve_fit(model, centers, density)
    # (array([2.00059957, 0.15158351]),
    #  array([[8.87310596e-07, 4.50003479e-08],
    #        [4.50003479e-08, 5.93772296e-07]])) 
    

    Using MLE and minimize:

    def factory(x):
        def wrapped(p):
            return - np.sum(stats.lognorm.logpdf(x, s=p[1], scale=np.exp(p[0])))
        return wrapped
    
    likelihood = factory(x)
    
    sol = optimize.minimize(likelihood, x0=[1, 1])
    #     fun: 45693.61136284961
    #  hess_inv: array([[ 3.38213795e-08, -1.68343490e-08],
    #        [-1.68343490e-08,  5.54538464e-07]])
    #       jac: array([-0.00048828,  0.00097656])
    #   message: 'Desired error not necessarily achieved due to precision loss.'
    #      nfev: 66
    #       nit: 17
    #      njev: 22
    #    status: 2
    #   success: False
    #         x: array([2.00008072, 0.15018327])
    

    enter image description here