Search code examples
scipynormal-distributiondata-fittingprobability-densityscipy.stats

How to fit data with log-normal distribution using norm.fit() in Scipy


I am trying to use Scipy.stats norm.fit() with some modifications to fit data with a log-normal distribution. And I want to verify the result with fitting the data using Scipy.stats lognorm.fit(). The result comes out to be just similar, but it should be the same. (The picture is shown in below link)

https://ibb.co/PxHWSNp

The way I use norm.fit() to fit data with log-normal distribution is that I bring in the log(x) in norm.fit() and divide the pdf by x. The reason for why I do this comes from the below two formulas.(The only difference in log-normal distribution pdf is the ln(x) and 1/x term) Could anybody help me on identifying where I am doing wrong?

  1. log-normal distribution pdf: https://ibb.co/Zd9J17T

  2. normal distribution pdf: https://ibb.co/Mgvpv31

#x is set from the center of the leftmost bar to the center of the rightmost bar
x = np.linspace(left_boundary,right_boundary,1000)

#data1 is the original data
data2 = np.log(data1)

params1 = lognorm.fit(data1,method='MLE',loc=0)
plt.plot(x,lognorm.pdf(x,params1[0],params1[1],params1[2]),label='Log-Normal fitting using lognorm.fit()')

params2 = norm.fit(data2,method='MLE',loc=0)
# Here I bring in the log(x) and divide the pdf by x
plt.plot(x,norm.pdf(np.log(x),params2[0],params2[1])/x,label='Log-Normal fitting using norm.fit()')

The reason why I want to do this is to check if I can copy the same way to derive the log-pearson3 distribution fitting since there's no log-pearson3 I can find in any library. Thank you.


Solution

  • Currently, you are passing loc=0 to lognorm.fit. This provides a guess of the loc parameter, but you want to fix it to zero using floc=0. This will give you exact agreement between the two fits.

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.stats import lognorm, norm
    
    # generate random data
    rng = np.random.default_rng(845834582458345)
    data1 = np.exp(rng.normal(size=100))
    data2 = np.log(data1)
    
    # generate abscissae for PDF
    x = np.linspace(0, 10, 300)
    
    # Fit the data using lognorm, fixing `loc` parameter to zero
    params1 = lognorm.fit(data1, method='MLE', floc=0)
    plt.plot(x, lognorm.pdf(x, *params1), '-',
             label='Log-Normal fitting using lognorm.fit()')
    
    params2 = norm.fit(data2, method='MLE',loc=0)
    # Here I bring in the log(x) and divide the pdf by x
    plt.plot(x, norm.pdf(np.log(x), *params2)/x, '--',
             label='Log-Normal fitting using norm.fit()')
    

    enter image description here