I am attempting to fit some data with a double Gaussian profile. The data looks almost perfectly Gaussian, but try as I might, I can't get a fit better than a certain shape, regardless of the initial guesses I input. I've tried to use the two gaussian equations listed below, but neither fit quite right. Overall I'd like it to be flatter on the continuum (no 'wings') and have a smoother, closer fit to the actual shape if possible.
Due to the nature of the follow-up analysis, the fit needs to be a double Gaussian, as I require the fitting parameters, and thus I can't consider other fitting methods. The data can be found here: https://docs.google.com/spreadsheets/d/1kMO2ogAL8ZCiDeY29kBvv5lzMfAD7dLj-5rKW8kW9Go/edit?usp=sharing
Below is an example of the code I've been using to try and fit the data, as well as the output figure.
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats
from scipy.optimize import curve_fit
from lmfit import Model
with open("data.txt","r") as f:
content=[i.strip() for i in f.readlines()]
vel=[]
I=[]
dI=[]
for i in range(8,len(content)):
line=content[i].split()
vel.append(float(line[0]))
I.append(float(line[1]))
dI.append(float(line[2]))
def gaussian(x, A, x0, sig):
return A*np.exp(-(x-x0)**2/(2*sig**2))
def gaussian2(x, amp, cen, wid):
return (amp/(np.sqrt(2*np.pi)*wid))*np.exp(-(x-cen)**2/(2*wid**2))
def multi_gaussian(x, *pars):
offset = pars[-1]
g1 = gaussian(x, pars[1], pars[0], pars[2])
g2 = gaussian(x, pars[3], pars[0], pars[4])
return g1 + g2 + offset
def multi_gaussian2(x, *pars):
offset = pars[-1]
g1 = gaussian2(x, pars[1], pars[0], pars[2])
g2 = gaussian2(x, pars[3], pars[0], pars[4])
return g1 + g2 + offset
offset=1
guess = [-15,-0.01,10,-0.01,10,1]
popt, pcov = curve_fit(multi_gaussian, vel, I, guess)
popt2, pcov2 = curve_fit(multi_gaussian2, vel, I, guess)
x=np.linspace(np.min(vel),np.max(vel), 2000)
plt.figure()
plt.scatter(vel,I,s=0.1,c='b')
plt.plot(x, multi_gaussian(x, *popt), 'r--', linewidth=1,label='Gaussian1')
plt.plot(x, multi_gaussian2(x, *popt2), 'g--', linewidth=1,label='Gaussian2')
plt.legend(loc='best')
plt.show()
The data in your linked spreadsheet only has 2 significant digits for velocity and intensity. That makes it basically impossible to try to "refine" your fit to get a better result. That said, I highly recommend using a lmfit script like this, that will include your intensity uncertainties in the fit:
import matplotlib.pyplot as plt
import numpy as np
from lmfit.models import GaussianModel, ConstantModel
data = np.loadtxt('ddata.txt', skiprows=1)
v = data[:, 0]
i = data[:, 1]
di = data[:, 2]
model = (ConstantModel(prefix='offset_') +
GaussianModel(prefix='p1_') +
GaussianModel(prefix='p2_'))
params = model.make_params(offset_c=1,
p1_amplitude=-1., p1_sigma=100, p1_center=25,
p2_amplitude=-1., p2_sigma=100, p2_center=-25)
init = model.eval(params, x=v)
result = model.fit(i, params, weights=1.0/(di+1.e-9), x=v)
print(result.fit_report())
plt.figure()
plt.scatter(v, i, s=0.5, label='data')
plt.plot(v, init, label='init')
plt.plot(v, result.best_fit, label='fit')
plt.legend()
plt.xlabel('velocity (mm/s)')
plt.ylabel('intensity')
plt.show()
For the data you supplied, this will print out a fit report like this:
[[Model]]
((Model(constant, prefix='offset_') + Model(gaussian, prefix='p1_')) + Model(gaussian, prefix='p2_'))
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 128
# data points = 191
# variables = 7
chi-square = 654.770994
reduced chi-square = 3.55853801
Akaike info crit = 249.314315
Bayesian info crit = 272.080229
[[Variables]]
offset_c: 1.00013943 +/- 5.1045e-05 (0.01%) (init = 1)
p1_amplitude: -1.36807407 +/- 0.08677931 (6.34%) (init = -1)
p1_center: 46.8019583 +/- 3.77807981 (8.07%) (init = 25)
p1_sigma: 57.3859589 +/- 2.39823612 (4.18%) (init = 100)
p2_amplitude: -1.16999330 +/- 0.08533205 (7.29%) (init = -1)
p2_center: -76.1117581 +/- 3.49975073 (4.60%) (init = -25)
p2_sigma: 51.7080694 +/- 2.08860434 (4.04%) (init = 100)
p1_fwhm: 135.133604 +/- 5.64741436 (4.18%) == '2.3548200*p1_sigma'
p1_height: -0.00951073 +/- 2.6406e-04 (2.78%) == '0.3989423*p1_amplitude/max(2.220446049250313e-16, p1_sigma)'
p2_fwhm: 121.763196 +/- 4.91828727 (4.04%) == '2.3548200*p2_sigma'
p2_height: -0.00902683 +/- 3.5183e-04 (3.90%) == '0.3989423*p2_amplitude/max(2.220446049250313e-16, p2_sigma)'
[[Correlations]] (unreported correlations are < 0.100)
C(p1_center, p2_amplitude) = -0.967
C(p1_amplitude, p2_center) = 0.959
C(p1_center, p2_center) = 0.956
C(p1_amplitude, p2_amplitude) = -0.946
C(p1_amplitude, p1_center) = 0.943
C(p2_amplitude, p2_center) = -0.943