Search code examples
pythoncurve-fittinglmfit

the use of lmfit ExponentialGaussianModel( )


Trying to fit ExponentialGaussianModel() from lmfit but getting the following error message: The input contains nan values

I am using Jupyternotebook on windows and I am new to python and lmfit. I find the lmfit documentation to be a bit obscure for a beginner and hope to find help here. Following is my code: I would like to generate a exponential-gaussian histogram, extract data points and practice fitting with lmfit library. ( I would like to practice fitting and finding least number of points that would reproduce the parameters used producing the histogram)

from scipy.stats import exponnorm
from lmfit.models import ExponentialGaussianModel

K2 = 1.5
r2 = exponnorm.rvs(K2, size=500, loc = 205, scale = 40)

Q           =  np.histogram(r2,500)
exp_gaus_x  =  Q[1]
exp_gaus_y  =  Q[0]

tof_x       =  exp_gaus_x[1:]
tof_y       =  exp_gaus_y

mod =  ExponentialGaussianModel()
pars = mod.guess(tof_y, x=tof_x)
out  = mod.fit(tof_y, pars, x=tof_x)
(out.fit_report(min_correl=0.25))

I get the error that there are nan input values. I was expecting a report like shown in the manual.


Solution

  • The definition of exponential Gaussian used in lmfit is from https://en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution. With the exponential term being

    exp[center*gamma + (gamma*sigma)**2/2 - gamma*x]

    this has the tendency to give Inf for large-ish values of sigma and gamma, and/or center. I believe that you are getting such Inf values and that this is the cause of the NaN message you are seeing. The fitting routines (in Fortran) do not handle NaN or Inf gracefully (actually, "at all"). This is a real limitation of that particular model. You'll see that the examples on the wikipedia page all have x values much closer to 1 than 200, and gamma and sigma also on the order of 1, not around 50, which is what the automated guess gives.

    I think that a simpler definition for an exponentially modified Gaussian would be better for you. With

    def expgaussian(x, amplitude=1, center=0, sigma=1.0, gamma=1.0):
        """ an alternative exponentially modified Gaussian."""
        dx = center-x
        return amplitude* np.exp(gamma*dx) * erfc( dx/(np.sqrt(2)*sigma))
    

    You'll get a decent fit, though the meaning of the parameters has changed a bit, and you will need to explicitly give starting values, not rely on a guess() procedure. These don't have to be very close, just not very far off.

    A full script might be:

    import numpy as np
    from scipy.stats import exponnorm
    from scipy.special import erfc
    from lmfit import Model
    import matplotlib.pyplot as plt
    
    def expgaussian(x, amplitude=1, center=0, sigma=1.0, gamma=1.0):
        """ an alternative exponentially modified Gaussian."""
        dx = center-x
        return amplitude* np.exp(gamma*dx) * erfc( dx/(np.sqrt(2)*sigma))
    
    K2 = 1.5
    r2 = exponnorm.rvs(K2, size=500, loc = 205, scale = 40)
    Q           =  np.histogram(r2,500)
    exp_gaus_x  =  Q[1]
    exp_gaus_y  =  Q[0]
    tof_x       =  exp_gaus_x[1:]
    tof_y       =  exp_gaus_y
    
    mod =  Model(expgaussian)
    pars = mod.make_params(sigma=20, gamma=0.1, amplitude=2, center=250)
    
    out  = mod.fit(tof_y, pars, x=tof_x)
    
    print(out.fit_report())
    
    plt.plot(tof_x, tof_y, label='data')
    plt.plot(tof_x, out.best_fit, label='fit')
    plt.legend()
    plt.show()
    

    which will print out

    [[Model]]
        Model(expgaussian)
    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 65
        # data points      = 500
        # variables        = 4
        chi-square         = 487.546692
        reduced chi-square = 0.98295704
        Akaike info crit   = -4.61101662
        Bayesian info crit = 12.2474158
    [[Variables]]
        gamma:      0.01664876 +/- 0.00239048 (14.36%) (init = 0.1)
        sigma:      39.6914678 +/- 3.90960254 (9.85%) (init = 20)
        center:     235.600396 +/- 11.8976560 (5.05%) (init = 250)
        amplitude:  3.43975318 +/- 0.15675053 (4.56%) (init = 2)
    [[Correlations]] (unreported correlations are < 0.100)
        C(gamma, center)     =  0.930
        C(sigma, center)     =  0.870
        C(gamma, amplitude)  =  0.712
        C(gamma, sigma)      =  0.693
        C(center, amplitude) =  0.572
        C(sigma, amplitude)  =  0.285
    

    and show a plot like

    enter image description here

    Hope that helps.