Search code examples
pythonnumpymatplotlibscipygaussian

Trying to fit a Gaussian+Line fit to data


I'm very new to python and programming in general (I started about one month ago). I am working on a project right now. I apologize if my grasp of the jargon is weak.

I have been given a csv file that has a wavelength and then 30 observations from that wavelength. I am supposed to run some code that will have wavelength as the independent variable, then 1 set of observations as the dependant variables. From this, I must fit a line+Gaussian function to the data, and I can take the mean(mu) of the Gaussian as the variable I need for more analysis. In order to get a good fit, I must find a way to automate the initial guesses for the Gaussian+line's parameters. I feel like I may have stumbled across this somehow already, as the gaussian fits it generates for each set of observations seem to fit really well. One way to get a good estimate for the gradient and y-intercept of the straight line is to plot a linear fit for the data and use those values of m and c in the gaussian+line fit but I'm not sure if I have done that correctly.

My question is, how can I extract the value of the gaussian mean (mu) from this data? Right now, my console is showing 0 +/- 0 for the mean, even though that certainly cannot be true. The code I am about to show is for the second set of observations, though I hope once I secure the code for one set of observations, I can simply copy and paste it for all the other sets of observations and simply tweak the data read in.

x=spectral_data[:,0] #selecting column 0 to be x-axis data
y2=spectral_data[:,2] #selecting column 1 to be y-axis data
plt.scatter(x,y2) #produce scatter plot
plt.title('Observation 2')
plt.ylabel('Intensity (arbitrary units)')
plt.xlabel('Wavelength (m)')

m,c=np.polyfit(x,y2,deg=1) #fits a linear model to the data, with m = slope and c = intercept
plt.plot(x,y2,'*')
plt.plot(x,c+m*x,'-') #plots the fit
print('The slope and intercept of the regression is,', m,c)
m_best=m
c_best=c
def fit_gauss(x,a,mu,sig,m,c): #defining a function for the gaussian+line fit
    gaus = a*sp.exp(-(x-mu)**2/(2*sig**2))
    line = m*x+c
    return gaus + line

initial_guess=[160,7.1*10**-7,0.2*10**-7,m_best,c_best]
po,po_cov=sp.optimize.curve_fit(fit_gauss,x,y2,initial_guess)

#print("The parameters")
#print(po)
#print('The covariance matrix')
#print(po_cov)

print("The signal parameters are")
print(" Gaussian amplitude = %.1f +/- %.1f" %(po[0],sp.sqrt(po_cov[0,0])))
print(" mu = %.1f +/- %.1f"%(po[1],sp.sqrt(po_cov[1,1])))
print(" Gaussian width (sigma) = %.1f +/- %.1f"%(po[2],sp.sqrt(po_cov[2,2])))
print("and the background estimate is")
print(" m = %.2f +/- %.2f"%(po[3],sp.sqrt(po_cov[3,3])))
print(" c = %.0f +/- %.0f"%(po[4],sp.sqrt(po_cov[4,4])))

plt.plot(x,fit_gauss(x,po[0],po[1],po[2],po[3],po[4]),label='Fit results')
plt.legend()

plt.show()

Console response

The slope and intercept of the regression is, 699564146.8510102 -314.0882660868497
The signal parameters are
 Gaussian amplitude = 26.7 +/- 0.6
 mu = 0.0 +/- 0.0
 Gaussian width (sigma) = 0.0 +/- 0.0
and the background estimate is
 m = 595726198.94 +/- 7933451.82
 c = -247 +/- 6

Console response
Console response

Plot of data
Plot of data


Solution

  • You are doing everything right until it comes to the prints at the bottom of your code. So you write:

    print(" mu = %.1f +/- %.1f"%(po[1],sp.sqrt(po_cov[1,1])))
    

    You specify your precision with %.1f, which means that your number of digits will be limited to a single digit after the dot. E.g. 1294.423 becomes 1294.4 but 0.001 will become 0.0! For more info, check for example:
    https://pyformat.info/
    Looking at your plot, you can see that your data has a magnitude of 1e-7, therefore the formatting of your string is working as expected. You could use exponential formatting instead, i.e. %.1e or just get your magnitude right by scaling the data beforehand!