Search code examples
pythonscipycurve-fittinggaussian

Trying to fit a gaussian with scipy


I have a diffractogram, and I need to fit a given peak to a gaussian function, I'm trying to use curve_fit from scipy.optimize for that but I'm getting some errors. First at all, because of my data, I've added a straight line to the equation to fit, so I have

enter image description here

When I try to fit this equation via

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

def gauss(x, mu, sigma, m, b):
    return (1.0/(sigma*np.sqrt(2.0*np.pi)))*np.exp((-(x-mu)**2)/(2.0*sigma**2)) + m*x + b

#load the dataset
datadrx = np.genfromtxt('data.txt')

X = datadrx[:,0]
Y = datadrx[:,1]

mu0 = np.mean(Y)
sigma0 = np.std(Y)

popt, pcov = curve_fit(gauss, X, Y, p0 = [mu0,sigma0, 1.0, 1.0])
plt.title('Gaussian fit')
plt.plot(X, gauss(X,*popt))
plt.plot(X,Y)

I get this warning

/home/eariasp/miniconda3/envs/herrcomp/lib/python3.10/site-packages/scipy/optimize/_minpack_py.py:906: OptimizeWarning: Covariance of the parameters could not be estimated
  warnings.warn('Covariance of the parameters could not be estimated',

And a straight line as a fit, see

enter image description here

When I remove the mx+b part I get a 'better' fit (I mean better because at least it's not just a straight line), but it's still far away from what I want

enter image description here

So, my question is, what is wrong? I found that it's a good idea to define the guess parameters p0 so I did it, but it didn't work in my case

In order to reproduce, you can use this dataset

Check that values in the first column are one close to another, I don't know if that has something to do, like subtractive cancellation, taking e to a negative small power or something; I found something about that but really I didn't get it


Solution

  • Yes, you need initial parameters; no, you're not calculating them correctly. The mean of Y is not the mean of your random variable, and the standard deviation of Y is not the standard deviation of your random variable.

    The better your constraints and guesses are, the faster and more reliably your fit will converge. Thankfully, for a Gaussian such constraints and guesses are reasonably straightforward to calculate - and in fact the guess is so good that for some (not all) applications, fit isn't even needed.

    import matplotlib.pyplot as plt
    import numpy as np
    from scipy.optimize import curve_fit
    
    
    def gauss(x: np.ndarray, a: float, mu: float, sigma: float, b: float) -> np.ndarray:
        return (
            a/sigma/np.sqrt(2*np.pi)
        )*np.exp(
            -0.5 * ((x-mu)/sigma)**2
        ) + b
    
    
    X, Y = np.loadtxt('data.txt', delimiter=' ').T
    xmin, xmax = X.min(), X.max()  # left and right bounds
    i_max = Y.argmax()             # index of highest value - for guess, assumed to be Gaussian peak
    ymax = Y[i_max]     # height of guessed peak
    mu0 = X[i_max]      # centre x position of guessed peak
    b0 = Y[:20].mean()  # height of baseline guess
    
    # https://en.wikipedia.org/wiki/Gaussian_function#Properties
    # Index of first argument to be at least halfway up the estimated bell
    i_half = np.argmax(Y >= (ymax + b0)/2)
    # Guess sigma from the coordinates at i_half. This will work even if the point isn't at exactly
    # half, and even if this point is a distant outlier the fit should still converge.
    sigma0 = (mu0 - X[i_half]) / np.sqrt(
        2*np.log(
            (ymax - b0)/(Y[i_half] - b0)
        )
    )
    
    a0 = (ymax - b0) * sigma0 * np.sqrt(2*np.pi)
    p0 = a0, mu0, sigma0, b0
    
    popt, _ = curve_fit(
        f=gauss, xdata=X, ydata=Y, p0=p0,
        bounds=(
            (     1, xmin,           0,    0),
            (np.inf, xmax, xmax - xmin, ymax),
        ),
    )
    print('Guess:', np.array(p0))
    print('Fit:  ', popt)
    
    fig, ax = plt.subplots()
    ax.set_title('Gaussian fit')
    ax.scatter(X, Y, marker='+', label='experiment', color='orange')
    ax.plot(X, gauss(X, *p0), label='guess', color='lightgrey')
    ax.plot(X, gauss(X, *popt), label='fit')
    ax.legend()
    plt.show()
    
    Guess: [2.43599321e+02 6.86947536e+01 2.23407054e-01 3.02000000e+02]
    Fit:   [2.28882618e+02 6.86885140e+01 2.25575284e-01 3.02398123e+02]
    

    fit and guess

    Your data are also too noisy and don't have enough samples for m to have any meaning.