Search code examples
python-3.xgaussian

Fitting data with a double Gaussian


I am attempting to fit some data with a double Gaussian profile. The data looks almost perfectly Gaussian, but try as I might, I can't get a fit better than a certain shape, regardless of the initial guesses I input. I've tried to use the two gaussian equations listed below, but neither fit quite right. Overall I'd like it to be flatter on the continuum (no 'wings') and have a smoother, closer fit to the actual shape if possible.

Due to the nature of the follow-up analysis, the fit needs to be a double Gaussian, as I require the fitting parameters, and thus I can't consider other fitting methods. The data can be found here: https://docs.google.com/spreadsheets/d/1kMO2ogAL8ZCiDeY29kBvv5lzMfAD7dLj-5rKW8kW9Go/edit?usp=sharing

Below is an example of the code I've been using to try and fit the data, as well as the output figure.

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats
from scipy.optimize import curve_fit
from lmfit import Model

with open("data.txt","r") as f:
    content=[i.strip() for i in f.readlines()]

vel=[]
I=[]
dI=[]
for i in range(8,len(content)):
    line=content[i].split() 
    vel.append(float(line[0]))
    I.append(float(line[1]))
    dI.append(float(line[2]))

def gaussian(x, A, x0, sig):
        return A*np.exp(-(x-x0)**2/(2*sig**2))

def gaussian2(x, amp, cen, wid):
        return (amp/(np.sqrt(2*np.pi)*wid))*np.exp(-(x-cen)**2/(2*wid**2))

def multi_gaussian(x, *pars):
        offset = pars[-1]
        g1 = gaussian(x, pars[1], pars[0], pars[2])
        g2 = gaussian(x, pars[3], pars[0], pars[4])
        return g1 + g2 + offset

def multi_gaussian2(x, *pars):
        offset = pars[-1]
        g1 = gaussian2(x, pars[1], pars[0], pars[2])
        g2 = gaussian2(x, pars[3], pars[0], pars[4])
        return g1 + g2 + offset

offset=1
guess = [-15,-0.01,10,-0.01,10,1]

popt, pcov = curve_fit(multi_gaussian, vel, I, guess)
popt2, pcov2 = curve_fit(multi_gaussian2, vel, I, guess)
x=np.linspace(np.min(vel),np.max(vel), 2000)

plt.figure()
plt.scatter(vel,I,s=0.1,c='b')
plt.plot(x, multi_gaussian(x, *popt), 'r--', linewidth=1,label='Gaussian1')
plt.plot(x, multi_gaussian2(x, *popt2), 'g--', linewidth=1,label='Gaussian2')
plt.legend(loc='best')
plt.show()

Gaussian fits to data.


Solution

  • The data in your linked spreadsheet only has 2 significant digits for velocity and intensity. That makes it basically impossible to try to "refine" your fit to get a better result. That said, I highly recommend using a lmfit script like this, that will include your intensity uncertainties in the fit:

    import matplotlib.pyplot as plt
    import numpy as np
    
    from lmfit.models import GaussianModel, ConstantModel
    
    data = np.loadtxt('ddata.txt', skiprows=1)
    v = data[:, 0]
    i = data[:, 1]
    di = data[:, 2]
    
    model = (ConstantModel(prefix='offset_') + 
             GaussianModel(prefix='p1_') + 
             GaussianModel(prefix='p2_'))
    
    params = model.make_params(offset_c=1,
                               p1_amplitude=-1., p1_sigma=100, p1_center=25,
                               p2_amplitude=-1., p2_sigma=100, p2_center=-25)
     
    init = model.eval(params, x=v)
    result = model.fit(i, params,  weights=1.0/(di+1.e-9), x=v)
    
    print(result.fit_report())
     
    plt.figure()
    plt.scatter(v, i, s=0.5, label='data')
    
    plt.plot(v, init, label='init')
    plt.plot(v, result.best_fit, label='fit')
    plt.legend()
    plt.xlabel('velocity (mm/s)')
    plt.ylabel('intensity')
    plt.show()
    

    For the data you supplied, this will print out a fit report like this:

    [[Model]]
        ((Model(constant, prefix='offset_') + Model(gaussian, prefix='p1_')) + Model(gaussian, prefix='p2_'))
    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 128
        # data points      = 191
        # variables        = 7
        chi-square         = 654.770994
        reduced chi-square = 3.55853801
        Akaike info crit   = 249.314315
        Bayesian info crit = 272.080229
    [[Variables]]
        offset_c:      1.00013943 +/- 5.1045e-05 (0.01%) (init = 1)
        p1_amplitude: -1.36807407 +/- 0.08677931 (6.34%) (init = -1)
        p1_center:     46.8019583 +/- 3.77807981 (8.07%) (init = 25)
        p1_sigma:      57.3859589 +/- 2.39823612 (4.18%) (init = 100)
        p2_amplitude: -1.16999330 +/- 0.08533205 (7.29%) (init = -1)
        p2_center:    -76.1117581 +/- 3.49975073 (4.60%) (init = -25)
        p2_sigma:      51.7080694 +/- 2.08860434 (4.04%) (init = 100)
        p1_fwhm:       135.133604 +/- 5.64741436 (4.18%) == '2.3548200*p1_sigma'
        p1_height:    -0.00951073 +/- 2.6406e-04 (2.78%) == '0.3989423*p1_amplitude/max(2.220446049250313e-16, p1_sigma)'
        p2_fwhm:       121.763196 +/- 4.91828727 (4.04%) == '2.3548200*p2_sigma'
        p2_height:    -0.00902683 +/- 3.5183e-04 (3.90%) == '0.3989423*p2_amplitude/max(2.220446049250313e-16, p2_sigma)'
    [[Correlations]] (unreported correlations are < 0.100)
        C(p1_center, p2_amplitude)    = -0.967
        C(p1_amplitude, p2_center)    =  0.959
        C(p1_center, p2_center)       =  0.956
        C(p1_amplitude, p2_amplitude) = -0.946
        C(p1_amplitude, p1_center)    =  0.943
        C(p2_amplitude, p2_center)    = -0.943
    

    and a plot of enter image description here