Search code examples
python-3.xcurve-fittinglmfit

Taking experimental errors into account in lmfit


I am trying to implement lmfit into my fitting routines, and I am having issues defining the errors. I premise that I read previous questions regarding the topic on this platform, and I also went through the docs, but some of my doubts are still there.

Below is a complete and minimal example of what I am trying to achieve.

import corner
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scipy as sp
import lmfit


font = {'fontname':'candara', "fontweight":"light"}
plt.style.use('ggplot')
plt.rcParams["figure.figsize"] = (8,4)
ax_fit_kws = dict(xlim=(0,0.12), ylim=(0.5,1.1))
ax_res_kws = dict(xlim=(0,0.12), ylim=(-0.1,0.1))

def mono_exp(SL_array, a, b):
    model = a * np.exp(-b * SL_array)
    return model
model = lmfit.Model(mono_exp)

SL_array = np.array((0.030, 0.040, 0.060, 0.080, 0.10))
data= np.array((1., 0.9524336, 0.92452666, 0.87995659, 0.82845576))
errs = np.array((0.00029904, 0.00049384, 0.00076344, 0.00053886, 0.00066012))

params = model.make_params(a=0, b=0)

result = model.fit(data=data, params=params, SL_array=SL_array, method="Nelder", markersize=10, weights=errs)

lmfit.report_fit(result)
result.plot(yerr = errs, ax_fit_kws=ax_fit_kws, ax_res_kws=ax_res_kws)

emcee_kws = dict(steps=400, burn=30, thin=20, is_weighted=False,
                 progress=True)
emcee_params = result.params.copy()
emcee_params.add('__lnsigma', value=np.log(0.1), min=np.log(0.001), max=np.log(2.0))
result_emcee = model.fit(data=data, SL_array=SL_array, params=emcee_params, method='emcee',
                         nan_policy='omit', fit_kws=emcee_kws)

lmfit.report_fit(result_emcee)

ax = plt.plot(SL_array, model.eval(params=result.params, SL_array=SL_array), label='Nelder', zorder=100)
result_emcee.plot_fit(ax=ax, data_kws=dict(color='gray', markersize=10), yerr = errs)
emcee_corner = corner.corner(result_emcee.flatchain, labels=result_emcee.var_names,
                             truths=list(result_emcee.params.valuesdict().values()))
plt.show()

My problem is rather simple: I would like the initial Nelder fitting routine to take into account the errs array as errors on the data array (which are my experimentally determined points). I am not sure that calling weights=errs achieves this goal. I have tried the solution implemented here: How do I include errors for my data in the lmfit least squares miniimization, and what is this error for conf_interval2d function in lmfit? But I could not make it work.

Another point which is not really clear to me is: does the emcee part of my fit take into account the residuals from the Nelder routine?

Many thanks in advance!

EDIT

After a bit more of research, I now believe that when calling weights I should actually give 1/err. By implementing this change, and applying scale_covar=False as described elsewhere (How to properly get the errors in lmfit) while artifically incrementing the error values (for instance, deliberate multiplication of the errs array by 100-fold) I do get the fitted parameters errors to increase substantially, which is an expected behaviour. In short: result = model.fit(data=data, params=params, SL_array=SL_array, method="Nelder", markersize=10, weights=errs) was changed to result = model.fit(data=data, params=params, SL_array=SL_array, method="Nelder", markersize=10, weights=1/errs). Is this correct?

I am still rather confused on the implementation of emcee in my case.


Solution

  • What you say in your edits is correct: You want to use weights=1./err to properly weight the residual of data and model by the uncertainties in the data, err.

    You probably want to use the same in your call to model.fit(..., method='emcee') too.

    I should say that the use of emcee in lmfit is rather confusing and gives the unfortunate impression that it is doing a fit. This is simply not true, as emcee (and, really MCMC as a method) can not really do a fit in the sense of "systematically refine parameter values in order to find an improved solution". What it is doing is exploring parameter space in the vicinity of the input parameter values (that happen to be the solution from the Nelder method). This exploration may find (more like "stumble upon" than "seek") an improved solution and the results will reflect the exploration it does.