the use of lmfit ExponentialGaussianModel( )

Trying to fit ExponentialGaussianModel() from lmfit but getting the following error message: The input contains nan values

I am using Jupyternotebook on windows and I am new to python and lmfit. I find the lmfit documentation to be a bit obscure for a beginner and hope to find help here. Following is my code: I would like to generate a exponential-gaussian histogram, extract data points and practice fitting with lmfit library. ( I would like to practice fitting and finding least number of points that would reproduce the parameters used producing the histogram)

from scipy.stats import exponnorm
from lmfit.models import ExponentialGaussianModel

K2 = 1.5
r2 = exponnorm.rvs(K2, size=500, loc = 205, scale = 40)

Q           =  np.histogram(r2,500)
exp_gaus_x  =  Q[1]
exp_gaus_y  =  Q[0]

tof_x       =  exp_gaus_x[1:]
tof_y       =  exp_gaus_y

mod =  ExponentialGaussianModel()
pars = mod.guess(tof_y, x=tof_x)
out  = mod.fit(tof_y, pars, x=tof_x)
(out.fit_report(min_correl=0.25))

I get the error that there are nan input values. I was expecting a report like shown in the manual.

Solution

The definition of exponential Gaussian used in lmfit is from https://en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution. With the exponential term being

exp[center*gamma + (gamma*sigma)**2/2 - gamma*x]

this has the tendency to give Inf for large-ish values of sigma and gamma, and/or center. I believe that you are getting such Inf values and that this is the cause of the NaN message you are seeing. The fitting routines (in Fortran) do not handle NaN or Inf gracefully (actually, "at all"). This is a real limitation of that particular model. You'll see that the examples on the wikipedia page all have x values much closer to 1 than 200, and gamma and sigma also on the order of 1, not around 50, which is what the automated guess gives.

I think that a simpler definition for an exponentially modified Gaussian would be better for you. With

def expgaussian(x, amplitude=1, center=0, sigma=1.0, gamma=1.0):
    """ an alternative exponentially modified Gaussian."""
    dx = center-x
    return amplitude* np.exp(gamma*dx) * erfc( dx/(np.sqrt(2)*sigma))

You'll get a decent fit, though the meaning of the parameters has changed a bit, and you will need to explicitly give starting values, not rely on a guess() procedure. These don't have to be very close, just not very far off.

A full script might be:

import numpy as np
from scipy.stats import exponnorm
from scipy.special import erfc
from lmfit import Model
import matplotlib.pyplot as plt

def expgaussian(x, amplitude=1, center=0, sigma=1.0, gamma=1.0):
    """ an alternative exponentially modified Gaussian."""
    dx = center-x
    return amplitude* np.exp(gamma*dx) * erfc( dx/(np.sqrt(2)*sigma))

K2 = 1.5
r2 = exponnorm.rvs(K2, size=500, loc = 205, scale = 40)
Q           =  np.histogram(r2,500)
exp_gaus_x  =  Q[1]
exp_gaus_y  =  Q[0]
tof_x       =  exp_gaus_x[1:]
tof_y       =  exp_gaus_y

mod =  Model(expgaussian)
pars = mod.make_params(sigma=20, gamma=0.1, amplitude=2, center=250)

out  = mod.fit(tof_y, pars, x=tof_x)

print(out.fit_report())

plt.plot(tof_x, tof_y, label='data')
plt.plot(tof_x, out.best_fit, label='fit')
plt.legend()
plt.show()

which will print out

[[Model]]
    Model(expgaussian)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 65
    # data points      = 500
    # variables        = 4
    chi-square         = 487.546692
    reduced chi-square = 0.98295704
    Akaike info crit   = -4.61101662
    Bayesian info crit = 12.2474158
[[Variables]]
    gamma:      0.01664876 +/- 0.00239048 (14.36%) (init = 0.1)
    sigma:      39.6914678 +/- 3.90960254 (9.85%) (init = 20)
    center:     235.600396 +/- 11.8976560 (5.05%) (init = 250)
    amplitude:  3.43975318 +/- 0.15675053 (4.56%) (init = 2)
[[Correlations]] (unreported correlations are < 0.100)
    C(gamma, center)     =  0.930
    C(sigma, center)     =  0.870
    C(gamma, amplitude)  =  0.712
    C(gamma, sigma)      =  0.693
    C(center, amplitude) =  0.572
    C(sigma, amplitude)  =  0.285

and show a plot like

Hope that helps.