Search code examples
pythoncurve-fittinglmfit

lmfit - best and init fit do not correspond to report


I am trying to fit data with Python library lmfit. In the data, there are two gaussian functions, the second (LO) higher than the first (TO). The code follows:

TOmod = GaussianModel(prefix="TO_")
LOmod = GaussianModel(prefix="LO_")
TOmod.set_param_hint('TO_center', min = 265, max = 270)
TOmod.set_param_hint('TO_amplitude', min = 3e3, max = 7e3)
TOmod.set_param_hint('TO_sigma', min = 0.1, max = 5)
LOmod.set_param_hint('LO_center', min = 280, max = 310)
LOmod.set_param_hint('LO_amplitude', min = 2e4, max = 4e4)
LOmod.set_param_hint('LO_sigma', min = 0.1, max = 8)
pars = TOmod.guess(y, x=x)
pars += LOmod.guess(y, x=x)
mod = TOmod + LOmod
out  = mod.fit(y, pars, x=x)
print(out.fit_report())

plt.figure("Fitting")
plt.plot(x, y, "bo")
plt.plot(x, out.init_fit, 'k--')
plt.plot(x, out.best_fit, 'r-')
plt.xlabel(r"k [cm$^{-1}$]")
plt.ylabel(r"Intensity")
plt.xlim(250,325)
plt.ylim(-1e3,3e4)
plt.show("Fitting")

I can produce this graph:

lmfit also produces following report:

[[Model]]
    (Model(gaussian, prefix='TO_') + Model(gaussian, prefix='LO_'))
[[Fit Statistics]]
    # function evals   = 30
    # data points      = 576
    # variables        = 6
    chi-square         = 2534929941.545
    reduced chi-square = 4447245.511
    Akaike info crit   = 8823.259
    Bayesian info crit = 8849.395
[[Variables]]
    TO_sigma:       0.14748432 +/- 0        (0.00%) (init= 0.1)
    TO_center:      270        +/- 0        (0.00%) (init= 270)
    TO_amplitude:   3000       +/- 0        (0.00%) (init= 3000)
    TO_fwhm:        0.34729903 +/- 0        (0.00%)  == '2.3548200*TO_sigma'
    TO_height:      8114.94309 +/- 0        (0.00%)  == '0.3989423*TO_amplitude/max(1.e-15, TO_sigma)'
    LO_sigma:       0.25064604 +/- 0        (0.00%) (init= 0.1)
    LO_center:      292.364593 +/- 0        (0.00%) (init= 292.3646)
    LO_amplitude:   24876.9938 +/- 0        (0.00%) (init= 20000)
    LO_fwhm:        0.59022631 +/- 0        (0.00%)  == '2.3548200*LO_sigma'
    LO_height:      39595.6185 +/- 0        (0.00%)  == '0.3989423*LO_amplitude/max(1.e-15, LO_sigma)'
[[Correlations]] (unreported correlations are <  0.100)

At first, final and initial LO_amplitudes do not correspond to the ones in report (f.e. you can see, that initial is 2e4, but in the graph it is at least 5e4). How come?

Secondly, I was expecting that the fit would be better, while all bounds are set.

EDIT 1 - data added

I have posted the full script, only with which is missing is loading the data:

x, y = np.genfromtxt("data.txt", unpack=True)

And here are the data:

3.235230000000000246e+02 8.074899999999997817e+02
3.217950000000000159e+02 7.387500000000000000e+02
3.200659999999999741e+02 8.103400000000001455e+02
3.183369999999999891e+02 9.050399999999999636e+02
3.166080000000000041e+02 1.176100000000000364e+03
3.148790000000000191e+02 1.483189999999999600e+03
3.131490000000000009e+02 1.729449999999999818e+03
3.114189999999999827e+02 2.281949999999999818e+03
3.096890000000000214e+02 2.486050000000000182e+03
3.079580000000000268e+02 2.867739999999999782e+03
3.062269999999999754e+02 3.205949999999999818e+03
3.044950000000000045e+02 4.065239999999999782e+03
3.027629999999999768e+02 5.081539999999999964e+03
3.010310000000000059e+02 6.767100000000000364e+03
2.992989999999999782e+02 9.268700000000000728e+03
2.975659999999999741e+02 1.334320000000000073e+04
2.958319999999999936e+02 1.946429999999999927e+04
2.940989999999999895e+02 2.552240000000000146e+04
2.923650000000000091e+02 2.720209999999999854e+04
2.906309999999999718e+02 2.314690000000000146e+04
2.888960000000000150e+02 1.642840000000000146e+04
2.871610000000000014e+02 1.048929999999999927e+04
2.854259999999999877e+02 6.923909999999999854e+03
2.836899999999999977e+02 4.836300000000000182e+03
2.819540000000000077e+02 3.501239999999999782e+03
2.802169999999999845e+02 2.686470000000000255e+03
2.784809999999999945e+02 2.227590000000000146e+03
2.767440000000000282e+02 1.781840000000000146e+03
2.750059999999999718e+02 1.582319999999999709e+03
2.732679999999999723e+02 1.520989999999999782e+03
2.715299999999999727e+02 2.011989999999999782e+03
2.697919999999999732e+02 3.021930000000000291e+03
2.680529999999999973e+02 4.754640000000000327e+03
2.663140000000000214e+02 5.088010000000000218e+03
2.645740000000000123e+02 3.515579999999999927e+03
2.628340000000000032e+02 2.159310000000000400e+03
2.610939999999999941e+02 1.190029999999999745e+03
2.593530000000000086e+02 7.985300000000002001e+02
2.576120000000000232e+02 5.780700000000001637e+02
2.558710000000000093e+02 4.897800000000002001e+02
2.541289999999999907e+02 3.914800000000000182e+02
2.523870000000000005e+02 3.046999999999998181e+02
2.506450000000000102e+02 3.270000000000000000e+02

Solution

  • I get pretty decent looking results by following the advice I gave and providing initial values and not imposing constraints. Did you try that?

    FWIW, I get even better results using a Lorentzian function for the wider LO peak. Try this:

    import numpy as np
    from lmfit.models import GaussianModel, LorentzianModel
    import matplotlib.pyplot as plt
    
    x, y = np.genfromtxt("data.txt", unpack=True)
    
    TOmod = GaussianModel(prefix="TO_")
    LOmod = LorentzianModel(prefix="LO_")
    mod = TOmod + LOmod
    
    pars = mod.make_params(TO_center=268, TO_sigma=2., TO_amplitude=50000,
                           LO_center=292, LO_sigma=4., LO_amplitude=50000)
    
    out  = mod.fit(y, pars, x=x)
    print(out.fit_report())
    

    For the alleged difference between graph report:

    As you might have read in the documentation (http://lmfit.github.io/lmfit-py/builtin_models.html, amplitude is the multiplier in front of the unit normalized Gaussian function, it is not the largest value the function will take. OTOH, the Parameter LO_height and TO_height are reported and will be the maximum value of the functions. Similarly, sigma is variance not the Full-Width-Half-Maximum, but that is also reported.