I have problem with finding the goodness of fit, as far as I know the reduced R^2 represents the goodness of fit and it have to be around 1. but in my case it is 0.033.
I have 3 quenstions:
1- Does my model is right to consider the expermental uncertianites of the data in the fitting model?
2- How can I scale the reduced R^2 to go around 1? as I readed, I have to find a better scalling, but if i am dealing with severl fit models, should i scale each one independently or all of them should have the same scalling?
3- Is there any other statistcal parameters to determine the best fit model between several fit models other than the reduced R^2 such as the Coefficient of determination?
Here is my code and data
import numpy as np
from lmfit import Minimizer, Parameters, fit_report
import scipy.constants as sc
x=np.array([-0.55140294, -0.53524022, -0.51157048, -0.46087502, -0.36867469,
-0.36463724, -0.33394184, -0.31759441, -0.24301006, -0.19108247,
-0.17992331, -0.15978266, -0.12565861, -0.06353306, -0.03416749,
0. , 0.05381462, 0.05741924, 0.12256307, 0.13431629,
0.15070565, 0.24975708, 0.30714032, 0.36017666, 0.36062918,
0.44028426, 0.46225328, 0.49792282, 0.51374084, 0.54654304,
0.5681194 ])
Data=np.array([3.10022439e-07, 5.28936553e-07, 1.08389150e-06, 4.51866520e-06,
4.03761662e-05, 4.51393005e-05, 8.40351313e-05, 1.11528073e-04,
3.84782855e-04, 7.40352481e-04, 8.34990761e-04, 1.01873368e-03,
1.31826299e-03, 1.85815838e-03, 2.02222472e-03, 2.09575003e-03,
1.98385694e-03, 1.91161400e-03, 1.40295968e-03, 1.28660846e-03,
1.12519421e-03, 3.50500171e-04, 1.43286865e-04, 5.16806731e-05,
4.98973395e-05, 8.13812022e-06, 4.47760445e-06, 1.66754498e-06,
1.07843394e-06, 3.95451986e-07, 1.97595319e-07])
uncertainities=np.array([3.45420178e-06, 3.69177339e-06, 4.15460196e-06, 5.79355068e-06,
1.25738539e-05, 1.30260341e-05, 1.69780223e-05, 1.94687535e-05,
3.41877881e-05, 4.68776601e-05, 4.97061736e-05, 5.47758901e-05,
6.29249004e-05, 7.43568853e-05, 7.73802945e-05, 7.85165928e-05,
7.49519891e-05, 7.44931618e-05, 6.24757861e-05, 5.97435073e-05,
5.57673389e-05, 3.15179837e-05, 2.03474708e-05, 1.29556740e-05,
1.29044986e-05, 6.53164560e-06, 5.52338978e-06, 4.36260354e-06,
3.99640819e-06, 3.45133957e-06, 3.21225716e-06])
def fGaussian(x,sig,A,xc,y0):
y=A/((2*sc.pi)**0.5*sig)*np.exp(-(x-xc)**2/(2*sig**2))+y0
return (y)
def fcn2min_fGauss(params, x, data, uncertainities):
sig = params['sig']
A= params['A']
xc= params['xc']
y0= params['y0']
model = fGaussian(x,sig,A,xc,y0)
return (model - data)/uncertainities
# =============================================================================
# create a set of Parameters for fGauss
params_fGauss= Parameters()
params_fGauss.add('sig', value=10**-3, min=0)
params_fGauss.add('A', value=10**-4, min=0)
params_fGauss.add('xc', value=0, min=0)
params_fGauss.add('y0', value=0, min=0)
print ('******************** Gauss Parameters **********************')
params_fGauss.pretty_print()
print ('************************************************************')
# =============================================================================
print('fitters initiation')
fcn_args=(x, Data,uncertainities)
fitter_fGauss = Minimizer(fcn2min_fGauss, params_fGauss, fcn_args=fcn_args,nan_policy='propagate')
# =============================================================================
# Least-Squares minimization, using Trust Region Reflective method
result_least_squares_Gauss = fitter_fGauss.minimize(method='least_squares',jac='3-point',loss='soft_l1')
print('fGauss fit report ---------------------------------')
print(fit_report(result_least_squares_Gauss))
The fitting report is
[[Fit Statistics]]
# fitting method = least_squares
# function evals = 37
# data points = 31
# variables = 4
chi-square = 0.91407466
reduced chi-square = 0.03385462
Akaike info crit = -101.238737
Bayesian info crit = -95.5027884
[[Variables]]
sig: 0.13202573 +/- 2.6268e-04 (0.20%) (init = 0.001)
A: 6.9992e-04 +/- 1.6989e-06 (0.24%) (init = 0.0001)
xc: 0.00130681 +/- 3.3337e-04 (25.51%) (init = 0)
y0: 1.4545e-24 +/- 2.4519e-07 (16857606207555559424.00%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
C(sig, y0) = -0.314
C(A, y0) = -0.149
C(sig, xc) = 0.124
C(sig, A) = 0.115
It sort of looks like your uncertainties are underestimated. Where your data (which resembles a Gaussian to high precision) is near 0, the uncertainty is about 10x of the value. Near the peak, the uncertainty is about 3% of the value. But if the data really had such uncertainty, you would expect a lot more variation in the actual data. You could test that by simulating a Gaussian with your expected amplitude and width and adding fake noise with a size of 5.e-5 or so (eg using numpy.random.normal(scale=5.e-5, size=len(x)
). I think you'll see that your actual data is not that noisy.
Just based on your value of reduced chi-square, you might guess that the uncertainties should be scaled by a factor of ~ sqrt(0.03385) ~ 0.18. That is that the uncertainties should be 5x to 6x what you have them.
FWIW, that won't change the estimated uncertainties in the parameters: those are automatically (well, by default) scaled by the sqrt of reduced chi-square, basically to avoid this problem of needing very well-scaled uncertainties.
1- Does my model is right to consider the experimental uncertianites
of the data in the fitting model?
Yes, I think so. I think the scale is just off.
2- How can I scale the reduced R^2 to go around 1? as I readed, I
have to find a better scalling, but if i am dealing with several
fit models, should i scale each one independently or all of them
should have the same scalling?
scaling your uncertainties by a factor of 1/5 to 1/6 will help.
3- Is there any other statistcal parameters to determine the
best fit model between several fit models other than the
reduced R^2 such as the Coefficient of determination?
Yes, the statistics "Akaike info crit" and "Bayesian info crit" are alternative statistics that serve similar roles to "reduced chi-square". These are all particularly useful for comparing fits with a different number of parameters.