Search code examples
pythonscipystatisticsdistributiondata-fitting

Lognorm distribution fitting


I am trying to do a lognorm distribution fit but the resulting paramter seem a bit odd. Could you please show me my mistake or explain to me if I am misinterpreting the parameters.

import numpy as np
import scipy.stats as st

data = np.array([1050000, 1100000, 1230000, 1300000, 1450000, 1459785, 1654000, 1888000])
s, loc, scale = st.lognorm.fit(data)
#calculating the mean
lognorm_mean = st.lognorm.mean(s = s, loc = loc, scale = scale)

The resulting mean is: 945853602904015.8.
But this doesn't make any sense.

The mean should be:

data_ln = np.log(data)
ln_mean = np.mean(data_ln)
ln_std = np.std(data_ln)
mean = np.exp(ln_mean + np.power(ln_std, 2)/2)

Here the resulting mean is 1391226.31. This should be correct.

Can you please help me with this topic?

Best regards
Norbi


Solution

  • I think you can tune the parameters of the minimizer to get an acceptable result:

    import numpy as np
    import scipy.stats as st
    from scipy.optimize import minimize
    
    data = np.array([1050000, 1100000, 1230000, 1300000, 
                     1450000, 1459785, 1654000, 1888000])
    
    def opti_wrap(fun, x0, args, disp=0, **kwargs):
        return minimize(fun, x0, args=args, method='SLSQP', 
                        tol=1e-12, options={'maxiter': 1000}).x
    
    s, loc, scale = st.lognorm.fit(data, optimizer=opti_wrap)
    lognorm_mean = st.lognorm.mean(s=s, loc=loc, scale=scale)
    print(lognorm_mean)  # should give 1392684.4350
    

    The reason you are seeing a strange result is due to the default minimizer failing to converge on the maximum likelihood result. This could be due to a mis-behaving cost function with so few data points (you are trying to fit 3 params but only have 8 data points...). Note: I'm using scipy version 1.1.0.