python-2.7 curve-fitting least-squares lmfit

Python lmfit reduced chi-square too small after weighted fit

I am running a fit in Python 2.7 with lmfit using some test data with the following code. I require a weighted fit with weights of 1/y (with the Leven-Marq. routine). I have defined weights and am using them here:

from __future__ import division
from numpy import array, var
from lmfit import Model
from lmfit.models import GaussianModel, LinearModel

import matplotlib.pyplot as plt
import seaborn as sns

xd = array([1267, 1268, 1269, 1270, 1271, 1272, 1273, 1274, 1275, 1276,
    1277, 1278, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288,
     1289, 1290, 1291, 1292, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300,
     1301, 1302, 1303, 1304, 1305, 1306, 1307, 1308, 1309, 1310, 1311, 1312,
     1313, 1314, 1315, 1316, 1317, 1318, 1319, 1320, 1321, 1322, 1323, 1324,
     1325, 1326, 1327, 1328, 1329, 1330, 1331, 1332, 1333, 1334])
yd = array([238, 262, 255, 271, 270, 281, 261, 278, 280, 254, 289, 285, 304, 314,
    329, 342, 379, 450, 449, 564, 613, 705, 769, 899, 987, 1043, 1183, 1295, 1298,
    1521, 1502, 1605, 1639, 1572, 1659, 1558, 1476, 1397, 1267, 1193, 1016, 951,
    835, 741, 678, 558, 502, 480, 442, 399, 331, 334, 308, 283, 296, 265, 264, 
    273, 258, 270, 262, 263, 239, 263, 251, 246, 246, 234])

mod = GaussianModel() + LinearModel()
pars  = mod.make_params(amplitude=25300, center=1299, sigma=7, slope=0, intercept=450)
result = mod.fit(yd, pars, method='leastsq', x=xd, weights=1./yd)
rsq = 1 - result.residual.var() / var(yd)
print(result.fit_report())
print rsq

plt.plot(xd, yd,         'bo', label='raw')
plt.plot(xd, result.init_fit, 'k--', label='Initial_Guess')
plt.plot(xd, result.best_fit, 'r-', label='Best')
plt.legend()
plt.show()

The output is:

[[Model]]
    (Model(gaussian) + Model(linear))
[[Fit Statistics]]
    # function evals   = 27
    # data points      = 68
    # variables        = 5
    chi-square         = 0.099
    reduced chi-square = 0.002
    Akaike info crit   = -434.115
    Bayesian info crit = -423.017
[[Variables]]
    sigma:       7.57360038 +/- 0.063715 (0.84%) (init= 7)
    center:      1299.41410 +/- 0.071046 (0.01%) (init= 1299)
    amplitude:   25369.3304 +/- 263.0961 (1.04%) (init= 25300)
    slope:      -0.15015228 +/- 0.071540 (47.65%) (init= 0)
    intercept:   452.838215 +/- 93.28860 (20.60%) (init= 450)
    fwhm:        17.8344656 +/- 0.150037 (0.84%)  == '2.3548200*sigma'
    height:      1336.33919 +/- 17.28192 (1.29%)  == '0.3989423*amplitude/max(1.e-15, sigma)'
.
.
.
.
0.999999993313

The last line (just above here, or immediately before plt.plot(xd, yd, 'bo', label='raw')) is the R^2 and the resulting fit is attached here..

The R^2 and the visual inspection of the output suggest this is a reasonable fit. I am expecting a reduced chi-squared of order 1.00 (source). However, the returned value for the reduced chi-squared value is several orders of magnitude smaller than 1.00.

Since the default is no weights in lmfit and I need a weighted fit, I have defined weights, but I think I need to be specifying them differently. My suspicion is this specification of weights might be causing the reduced chi-squared to be so small.

Is there a different way to specify weights, or some other parameter, such that the reduced chi-squared after the curve fit is close to or on the same magnitude as 1.00?

Solution

The weight in lmfit is a multiplicative factor for the residua to be minimized in the least-squares sense. That is, it replaces

residual = model - data

with

residual = (model - data) * weights

A common approach, and one that I think you might be intending, is to say that the weights should be 1.0/variance_in_data, as that is what usually means to get to reduced chi-square around 1 for a good fit, as the excellent writeup you link to discusses.

As discussed there, the problem is determining the variance in the data. For many cases, such as when the signal is dominated by counting statistics, the variance in data can be estimated as sqrt(data). This ignores many sources of noise, but is often a good starting point. As it happens, I believe using

result = model.fit(..., weights=np.sqrt(1.0/yd))

will lead to a reduced chi-square of around 0.8 for your case. I think that is probably what you want.

Also, to clarify a related point: The writeup you link discusses scaling the uncertainties in the fitted parameters when reduced chi-square is far from 1. Lmfit does the scaling described there by default (the scale_covar option can turn this off), so that changing the scale of the weights won't change the scale of the uncertainties in the parameters sigma, center, etc. The values for the uncertainties (and, best-fit values) will change some because the change in weighting changes the emphasis for each data point, but the best-fit values won't change much, and the estimated uncertainties should stay the same order of magnitude even if your estimate of the variance in the data (and so reduced chi-square) is off by a few orders of magnitude.

That is, changing your script to use weights=1.0/np.sqrt(yd) will bring reduced chi-square much closer to 1, but it will not change the uncertainties in the fitted variables very much.