I have some data from the lab, which I'm fitting to the function
f(s, params) = amp * (s/sp) / (1 + s/sp + 1.041) + bg (couldn't figure out how to type set this)
I set absolute_sigma=True
on curve_fit
to get the absolute uncertainties of the parameters (sp, amp, bg), but the calculated error from np.sqrt(np.diag(pcov))
seems unreasonable. See the plot below. The blue dots are data. The red line is the f(s, *popt)
. The green line replaces the optimal sp value with sp - it's error as calculated from np.sqrt(np.diag(pcov))
. The orange line is sp + the same value.
I would expect the +/- lines to be much tighter to the red line.
Here's a minimal example:
# Function for fitting
def scattering_rate(s, sp, amp, bg):
return amp * (s/sp) / (1 + s/sp + (-20/19.6)**2) + bg
# data
s = np.array([0.6, 1.2, 2.3, 4.3, 8.1, 15.2, 28.5, 53.4])
y = np.array([8.6, 8.5, 8.9, 9.5, 10.6, 12.6, 15.5, 18.3])
# Fit data to saturated scattering rate
popt, pcov = curve_fit(scattering_rate, s, y, absolute_sigma=True)
print('Fit parameters', popt)
print('Fit uncertainties', np.sqrt(np.diag(pcov)))
# Construct fit from optimized parameters
fit_s = np.linspace(np.min(s), np.max(s), 100)
fit = scattering_rate(fit_s, *popt)
# Consider one error difference in sp value
fit_plus_err = saturation_power(fit_s, popt[0] + np.sqrt(np.diag(pcov))[0], popt[1], popt[2])
fit_minus_err = saturation_power(fit_s, popt[0] - np.sqrt(np.diag(pcov))[0], popt[1], popt[2])
# Plot
plt.plot(s, y, '-o', markeredgecolor='k', label='data')
plt.plot(fit_s, fit_plus_err, label='sp + err')
plt.plot(fit_s, fit_minus_err, label='sp - err')
plt.plot(fit_s, fit, '-r', label='opt sp')
plt.xlabel('s')
plt.ylabel('y')
plt.legend()
plt.grid()
plt.show()
Edit
Following @jlandercy's answer, we need the error bars of the original data which are y_err = array([0.242, 0.231, 0.282, 0.31 , 0.373])
. Including that in curve_fit
's sigma
argument, the results look much better though still a bit distance
When using absolute_sigma=True
your are advised to feed the sigma
switch as well, if not they are replaced by 1.
making Chi Square loss function behaves as RSS and pcov
becomes somehow meaningless if your uncertainty are not unitary.
Provides uncertainty or an estimate of it as well to get a meaningful pcov
matrix.
Setting sigma
to 0.15
with your data drastically improve the pcov
matrix while keeping the Chi Square statistics significant:
sy = np.full(y.size, 0.15)
popt, pcov = curve_fit(scattering_rate, s, y, sigma=sy, absolute_sigma=True)
np.sqrt(pcov)
# array([[3.25479068, 2.07084837, 0.45391942],
# [2.07084837, 1.35773667, 0.2563505 ],
# [0.45391942, 0.2563505 , 0.09865549]])
You have two direct options to improve the error:
dof=2
which is very asymmetric);y
.Figure below shows the impact on a synthetic dataset with equivalent parameters of yours: