Trying to fit a gaussian with scipy

I have a diffractogram, and I need to fit a given peak to a gaussian function, I'm trying to use curve_fit from scipy.optimize for that but I'm getting some errors. First at all, because of my data, I've added a straight line to the equation to fit, so I have

When I try to fit this equation via

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

def gauss(x, mu, sigma, m, b):
    return (1.0/(sigma*np.sqrt(2.0*np.pi)))*np.exp((-(x-mu)**2)/(2.0*sigma**2)) + m*x + b

#load the dataset
datadrx = np.genfromtxt('data.txt')

X = datadrx[:,0]
Y = datadrx[:,1]

mu0 = np.mean(Y)
sigma0 = np.std(Y)

popt, pcov = curve_fit(gauss, X, Y, p0 = [mu0,sigma0, 1.0, 1.0])
plt.title('Gaussian fit')
plt.plot(X, gauss(X,*popt))
plt.plot(X,Y)

I get this warning

/home/eariasp/miniconda3/envs/herrcomp/lib/python3.10/site-packages/scipy/optimize/_minpack_py.py:906: OptimizeWarning: Covariance of the parameters could not be estimated
  warnings.warn('Covariance of the parameters could not be estimated',

And a straight line as a fit, see

When I remove the mx+b part I get a 'better' fit (I mean better because at least it's not just a straight line), but it's still far away from what I want

So, my question is, what is wrong? I found that it's a good idea to define the guess parameters p0 so I did it, but it didn't work in my case

In order to reproduce, you can use this dataset

Check that values in the first column are one close to another, I don't know if that has something to do, like subtractive cancellation, taking e to a negative small power or something; I found something about that but really I didn't get it

Solution

Yes, you need initial parameters; no, you're not calculating them correctly. The mean of Y is not the mean of your random variable, and the standard deviation of Y is not the standard deviation of your random variable.

The better your constraints and guesses are, the faster and more reliably your fit will converge. Thankfully, for a Gaussian such constraints and guesses are reasonably straightforward to calculate - and in fact the guess is so good that for some (not all) applications, fit isn't even needed.

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit


def gauss(x: np.ndarray, a: float, mu: float, sigma: float, b: float) -> np.ndarray:
    return (
        a/sigma/np.sqrt(2*np.pi)
    )*np.exp(
        -0.5 * ((x-mu)/sigma)**2
    ) + b


X, Y = np.loadtxt('data.txt', delimiter=' ').T
xmin, xmax = X.min(), X.max()  # left and right bounds
i_max = Y.argmax()             # index of highest value - for guess, assumed to be Gaussian peak
ymax = Y[i_max]     # height of guessed peak
mu0 = X[i_max]      # centre x position of guessed peak
b0 = Y[:20].mean()  # height of baseline guess

# https://en.wikipedia.org/wiki/Gaussian_function#Properties
# Index of first argument to be at least halfway up the estimated bell
i_half = np.argmax(Y >= (ymax + b0)/2)
# Guess sigma from the coordinates at i_half. This will work even if the point isn't at exactly
# half, and even if this point is a distant outlier the fit should still converge.
sigma0 = (mu0 - X[i_half]) / np.sqrt(
    2*np.log(
        (ymax - b0)/(Y[i_half] - b0)
    )
)

a0 = (ymax - b0) * sigma0 * np.sqrt(2*np.pi)
p0 = a0, mu0, sigma0, b0

popt, _ = curve_fit(
    f=gauss, xdata=X, ydata=Y, p0=p0,
    bounds=(
        (     1, xmin,           0,    0),
        (np.inf, xmax, xmax - xmin, ymax),
    ),
)
print('Guess:', np.array(p0))
print('Fit:  ', popt)

fig, ax = plt.subplots()
ax.set_title('Gaussian fit')
ax.scatter(X, Y, marker='+', label='experiment', color='orange')
ax.plot(X, gauss(X, *p0), label='guess', color='lightgrey')
ax.plot(X, gauss(X, *popt), label='fit')
ax.legend()
plt.show()

Guess: [2.43599321e+02 6.86947536e+01 2.23407054e-01 3.02000000e+02]
Fit:   [2.28882618e+02 6.86885140e+01 2.25575284e-01 3.02398123e+02]

Your data are also too noisy and don't have enough samples for m to have any meaning.