Search code examples
pythonnumpysimulationcurve-fittingdistribution

How do I get the parameters for the lognormal distribution (mean, sd) from the 's', 'loc', and 'scale' values outputted from a distribution fitter?


Here is my code:

model = Fitter(data,distributions='lognorm') model.fit() print(model.summary()) model.get_best(method='sumsquare_error')

It outputs this:

{'lognorm': {'s': 0.8612498618921953, 'loc': 0.3175483462045392, 'scale': 0.30617261628597686}}

And I want to find the right parameters to plug into SIMIO for Lognormal( normalMean , normalStdDev ) but I'm confused with what s, loc and scale is and how I can easily change it to the normalmean and normalSD that I need for SIMIO

I looked into what the Lognormal Distribution is on wiki and it just confused me more and I tried the code

logsample = stats.norm.rvs(loc=0.3175483462045392, scale=0.30617261628597686, size=1000) sample = np.exp(logsample) shape, loc, scale = stats.lognorm.fit(sample, floc=0) shape, loc, scale np.log(scale), shape but I am still unsure


Solution

  • If Y is lognormally distributed with shape s and scale scale, then the log(Y) is normally distributed with mean np.log(scale) and standard deviation s. (Source: scipy.stats.lognorm)

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy import stats
    rng = np.random.default_rng(3483594639650936)
    
    s = 0.8612498618921953,
    loc = 0  # required if we want the log to be normally distributed
    scale = 0.30617261628597686
    lognorm = stats.lognorm(s=s, loc=loc, scale=scale)
    data = lognorm.rvs(size=10000, random_state=rng)
    logdata = np.log(data)
    
    x = np.linspace(logdata.min(), logdata.max(), 300)
    sigma = s
    mu = np.log(scale)
    norm = stats.norm(loc=mu, scale=sigma)
    pdf = norm.pdf(x)
    
    plt.hist(logdata, density=True, bins=50)
    plt.plot(x, pdf)
    

    enter image description here

    The log of a general three-parameter lognormal random variable (with nonzero location) would not be normal; after all, it has three independent parameters whereas the normal distribution only has two. You are welcome to change loc = 1 in the code above and plot the histogram of the logdata. It is clearly not normal.

    If you want to be able to fit a lognormal distribution to data and transform the parameters to those of the corresponding normal distribution, you need to fix the location of the lognormal distribution to zero when fitting. Using SciPy to perform the fit, that would be:

    s, loc, scale = stats.lognorm.fit(data, floc=0)
    mu, sigma = stats.norm.fit(logdata)
    np.testing.assert_allclose(mu, np.log(scale))
    np.testing.assert_allclose(sigma, s)