Search code examples
pythonnumpydata-analysischi-squared

Generating chi_squared using best fit


I am generating 1000 iterations of some fake data set and using curve_fit to find the best fitting model (A gaussian with certain mean, offset,amp...) Instead of guessing the mean value of my fitting model, use curve_fit so that I can have better chi2 values. But now the output is weird and even the data and the model don't agree.This is the scatter plots of my data and fitting model

Also, if I plot a histogram of my chi2 values, The chi2_fit values don't spread out like expected

true_mean = 5 #Assume that we know the perfect fitting model is gaussian 
#with mean value 5. And use try_mean with different mean values to see how 
#does the chi2 behave
true_amp = 100
true_wid = 2
true_offset =10
x_values =  np.array([i for i in np.arange(0,10,0.4)])
exact_y_values = np.array([true_offset + true_amp*
                               np.exp(-((i-true_mean)/true_wid)**2)
                               for i in x_values])
    

    
try_mean = 4.8   # Notice the data is generated centered at 5;
# by comparing it to a 4.8 we expect disagreement.
try_amp = 100
try_wid = 2  
try_offset =10
try_these_y_values = np.array([try_offset + try_amp*
                                   np.exp(-((i-try_mean)/try_wid)**2)
                                   for i in x_values])

def func (x_values,offset,amp,mean,wid):
    return (offset + amp*np.exp(-((i-mean)/wid)**2))
#Excercise 2
#def func (x_values,offset,amp,mean,wid): return (offset + amp*np.exp(-((i-mean)/wid)**2))

chi2_fit=np.zeros(1000)
for i in range (1000):

    fake_exp_y_values = np.array([np.random.poisson(y)
                                      for y in exact_y_values])
    p01=[true_offset,true_amp,true_mean,true_wid]
    [popt,pcov]=opt.curve_fit(func,x_values, fake_exp_y_values,p01)
    y_values_fit=np.array([popt[0]+ popt[1]
                           *np.exp(
                               -((x_values-popt[2])/popt[3])**2)])
    residuals=fake_exp_y_values-y_values_fit
    y_err=np.clip(np.sqrt(fake_exp_y_values),1,9999)
    pulls=residuals/y_err
    chi2_fit[i]=np.sum(pulls**2)
    
plt.hist(chi2_fit)
plt.scatter(x_values,exact_y_values,color='k',ls='--',
            label='true model')
plt.scatter(x_values,y_values_fit,color='r')
plt.errorbar(x_values,y_values_fit,yerr=y_err)

What should be modified in order to have a reasonable plot.something like this And the histogram should be like this


Solution

  • The problem is definition of your function and usage of i:

    def func (x_values,offset,amp,mean,wid):
        return (offset + amp*np.exp(-((i-mean)/wid)**2))
    
    for i in range (1000):
        ...
        [popt,pcov]=opt.curve_fit(func,x_values, fake_exp_y_values,p01)
        ...
    

    Corrected version:

    def func (x_values,offset,amp,mean,wid):
        return (offset + amp*np.exp(-((x_values-mean)/wid)**2))
    

    Also please next time post MRE. This one was not minimal and not reproduciable (I had to add y_values_fit = y_values_fit.flatten() at the end to make it work).