Search code examples
pythonstatisticscurve-fittingchi-squaredlmfit

Problem with scaling of model to obtain Chi-squared around 1


I have problem with finding the goodness of fit, as far as I know the reduced R^2 represents the goodness of fit and it have to be around 1. but in my case it is 0.033.

I have 3 quenstions:

1- Does my model is right to consider the expermental uncertianites of the data in the fitting model?

2- How can I scale the reduced R^2 to go around 1? as I readed, I have to find a better scalling, but if i am dealing with severl fit models, should i scale each one independently or all of them should have the same scalling?

3- Is there any other statistcal parameters to determine the best fit model between several fit models other than the reduced R^2 such as the Coefficient of determination?

Here is my code and data

import numpy as np
from lmfit import Minimizer, Parameters, fit_report
import scipy.constants as sc

x=np.array([-0.55140294, -0.53524022, -0.51157048, -0.46087502, -0.36867469,
       -0.36463724, -0.33394184, -0.31759441, -0.24301006, -0.19108247,
       -0.17992331, -0.15978266, -0.12565861, -0.06353306, -0.03416749,
        0.        ,  0.05381462,  0.05741924,  0.12256307,  0.13431629,
        0.15070565,  0.24975708,  0.30714032,  0.36017666,  0.36062918,
        0.44028426,  0.46225328,  0.49792282,  0.51374084,  0.54654304,
        0.5681194 ])
Data=np.array([3.10022439e-07, 5.28936553e-07, 1.08389150e-06, 4.51866520e-06,
       4.03761662e-05, 4.51393005e-05, 8.40351313e-05, 1.11528073e-04,
       3.84782855e-04, 7.40352481e-04, 8.34990761e-04, 1.01873368e-03,
       1.31826299e-03, 1.85815838e-03, 2.02222472e-03, 2.09575003e-03,
       1.98385694e-03, 1.91161400e-03, 1.40295968e-03, 1.28660846e-03,
       1.12519421e-03, 3.50500171e-04, 1.43286865e-04, 5.16806731e-05,
       4.98973395e-05, 8.13812022e-06, 4.47760445e-06, 1.66754498e-06,
       1.07843394e-06, 3.95451986e-07, 1.97595319e-07])
uncertainities=np.array([3.45420178e-06, 3.69177339e-06, 4.15460196e-06, 5.79355068e-06,
       1.25738539e-05, 1.30260341e-05, 1.69780223e-05, 1.94687535e-05,
       3.41877881e-05, 4.68776601e-05, 4.97061736e-05, 5.47758901e-05,
       6.29249004e-05, 7.43568853e-05, 7.73802945e-05, 7.85165928e-05,
       7.49519891e-05, 7.44931618e-05, 6.24757861e-05, 5.97435073e-05,
       5.57673389e-05, 3.15179837e-05, 2.03474708e-05, 1.29556740e-05,
       1.29044986e-05, 6.53164560e-06, 5.52338978e-06, 4.36260354e-06,
       3.99640819e-06, 3.45133957e-06, 3.21225716e-06])

def fGaussian(x,sig,A,xc,y0):
    y=A/((2*sc.pi)**0.5*sig)*np.exp(-(x-xc)**2/(2*sig**2))+y0
    return (y)

def fcn2min_fGauss(params, x, data, uncertainities):
    sig = params['sig']
    A= params['A']
    xc= params['xc']
    y0= params['y0']
    model = fGaussian(x,sig,A,xc,y0)
    return (model - data)/uncertainities

# =============================================================================
# create a set of Parameters for fGauss
params_fGauss= Parameters()
params_fGauss.add('sig', value=10**-3, min=0)
params_fGauss.add('A', value=10**-4, min=0)
params_fGauss.add('xc', value=0, min=0)
params_fGauss.add('y0', value=0, min=0)
print ('******************** Gauss Parameters **********************')
params_fGauss.pretty_print()
print ('************************************************************')
# =============================================================================
print('fitters initiation')
fcn_args=(x, Data,uncertainities)
fitter_fGauss = Minimizer(fcn2min_fGauss, params_fGauss, fcn_args=fcn_args,nan_policy='propagate')
# =============================================================================
# Least-Squares minimization, using Trust Region Reflective method
result_least_squares_Gauss = fitter_fGauss.minimize(method='least_squares',jac='3-point',loss='soft_l1')
print('fGauss fit report ---------------------------------')
print(fit_report(result_least_squares_Gauss))

The fitting report is

[[Fit Statistics]]
    # fitting method   = least_squares
    # function evals   = 37
    # data points      = 31
    # variables        = 4
    chi-square         = 0.91407466
    reduced chi-square = 0.03385462
    Akaike info crit   = -101.238737
    Bayesian info crit = -95.5027884
[[Variables]]
    sig:  0.13202573 +/- 2.6268e-04 (0.20%) (init = 0.001)
    A:    6.9992e-04 +/- 1.6989e-06 (0.24%) (init = 0.0001)
    xc:   0.00130681 +/- 3.3337e-04 (25.51%) (init = 0)
    y0:   1.4545e-24 +/- 2.4519e-07 (16857606207555559424.00%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
    C(sig, y0) = -0.314
    C(A, y0)   = -0.149
    C(sig, xc) =  0.124
    C(sig, A)  =  0.115

The data and fitting curve


Solution

  • It sort of looks like your uncertainties are underestimated. Where your data (which resembles a Gaussian to high precision) is near 0, the uncertainty is about 10x of the value. Near the peak, the uncertainty is about 3% of the value. But if the data really had such uncertainty, you would expect a lot more variation in the actual data. You could test that by simulating a Gaussian with your expected amplitude and width and adding fake noise with a size of 5.e-5 or so (eg using numpy.random.normal(scale=5.e-5, size=len(x)). I think you'll see that your actual data is not that noisy.

    Just based on your value of reduced chi-square, you might guess that the uncertainties should be scaled by a factor of ~ sqrt(0.03385) ~ 0.18. That is that the uncertainties should be 5x to 6x what you have them.

    FWIW, that won't change the estimated uncertainties in the parameters: those are automatically (well, by default) scaled by the sqrt of reduced chi-square, basically to avoid this problem of needing very well-scaled uncertainties.

    1- Does my model is right to consider the experimental uncertianites 
    of the data in the fitting model?
    

    Yes, I think so. I think the scale is just off.

    2- How can I scale the reduced R^2 to go around 1? as I readed, I 
    have to find a better scalling, but if i am dealing with several 
    fit models, should i scale each one independently or all of them 
    should have the same scalling?
    

    scaling your uncertainties by a factor of 1/5 to 1/6 will help.

    3- Is there any other statistcal parameters to determine the 
    best fit model between several fit models other than the 
    reduced R^2 such as the Coefficient of determination?
    

    Yes, the statistics "Akaike info crit" and "Bayesian info crit" are alternative statistics that serve similar roles to "reduced chi-square". These are all particularly useful for comparing fits with a different number of parameters.