Search code examples
pythonnumpymodel-fitting

Fit gaussian integral function to data


I have a problem with finding a least-square-fit for a set of given data. I know the data follows a function witch is a convolution of a gaussian and a rectangle (x-ray through a broad slit). What I have done so far is taken a look at the convolution integral and discover that it comes down the this: enter image description here the integration parameter a is the width of the slit (unknown and desired) with g(x-t) a gaussian function defined as enter image description here So basically the function to fit is a integratiofunction of a gaussian with the integration borders given by the width parameter a. The integration is then also carried out with a shift of x-t.

Here is a smaller part of the Data and a handmade fit. from pylab import * from scipy.optimize import curve_fit from scipy.integrate import quad

# 1/10 of the Data to show the form.
xData = array([-0.1 , -0.09, -0.08, -0.07, -0.06, -0.05, -0.04, -0.03, -0.02,
       -0.01,  0.  ,  0.01,  0.02,  0.03,  0.04,  0.05,  0.06,  0.07,
        0.08,  0.09,  0.1 ])
yData = array([  18.      ,   22.      ,   22.      ,   34.000999,   54.002998,
        152.022995,  398.15799 ,  628.39502 ,  884.781982,  848.719971,
        854.72998 ,  842.710022,  762.580994,  660.435974,  346.119995,
        138.018997,   40.001999,    8.      ,    6.      ,    4.      ,
        6.      ])
yerr = 0.1*yData # uncertainty of the data

plt.scatter(xData, yData)
plt.show()

plot of the Data

# functions
def gaus(x, *p):
    """ gaussian with p = A, mu, sigma """
    A, mu, sigma = p
    return A/(sqrt(2*pi)*sigma)*numpy.exp(-(x-mu)**2/(2.*sigma**2))

def func(x,*p):
    """ Convolution of gaussian and rectangle is a gaussian integral.
        Parameters: A, mu, sigma, a"""
    A, mu, sigma, a = p
    return quad(lambda t: gaus(x-t,A,mu,sigma),-a,a)
vfunc = vectorize(func)  # Probably this is a Problem but if I dont use it, func can only be evaluated at 1 point not an array

To see that func does indeed describe the data and my calculatons are right I played around with data and function and tired to match them. I found the following to be feasible:

p0=[850,0,0.01, 0.04] # will be used as starting values for fitting
sample = linspace(-0.1,0.1,200) # just to make the plot smooth
y, dy = vfunc(sample,*p0)       

plt.plot(sample, y, label="Handmade Fit")
plt.scatter(xData, yData, label="Data")
plt.legend()
plt.show()

Data and handmade fit The problem occurs, when I try to fit the data using the just obtained starting values:

fp, Sfcov =  curve_fit(vfunc, xData, yData, p0=p0, sigma=yerr)
yf = vfunc(xData, fp)
plt.plot(x, yf, label="Fit")
plt.show()


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-83-6d362c4b9204> in <module>()
----> 1 fp, Sfcov =  curve_fit(vfunc, xData, yData, p0=p0, sigma=yerr)
      2 yf = vfunc(xData,fp)
      3 plt.plot(x,yf, label="Fit")

    /usr/lib/python3/dist-packages/scipy/optimize/minpack.py in curve_fit(f, xdata, ydata, p0, sigma, **kw)
    531     # Remove full_output from kw, otherwise we're passing it in twice.
    532     return_full = kw.pop('full_output', False)
--> 533     res = leastsq(func, p0, args=args, full_output=1, **kw)
    534     (popt, pcov, infodict, errmsg, ier) = res
    535 

/usr/lib/python3/dist-packages/scipy/optimize/minpack.py in leastsq(func, x0, args, Dfun, full_output, col_deriv, ftol, xtol, gtol, maxfev, epsfcn, factor, diag)
    369     m = shape[0]
    370     if n > m:
--> 371         raise TypeError('Improper input: N=%s must not exceed M=%s' % (n, m))
    372     if epsfcn is None:
    373         epsfcn = finfo(dtype).eps

TypeError: Improper input: N=4 must not exceed M=2

I think this does mean I have less data points than fit-parameters. Well lets look at it:

print("Fit-Parameters: %i"%len(p0))
print("Datapoints: %i"%len(yData))

Fit-Parameters: 4
Datapoints: 21

And actually I have 210 data points.

Like written above I don't really understand why I need to use the vectorise function from numpy for the integral-function (func <> vfunc) but not using it doesnt help either. In general one can pass a numpy array to a function but it appears to be not working here. On the other hand, I might be overestimating the power of leas-square-fit here and it might not be usable in this case but I do not like to use maximum-likelihood here. In general I have never tried to fit a integral function to data so this is new to me. Likely the problem is here. My knowledge of quad is limited and there might be a better way. Carrying out the integral analytically is not possible to my knowledge but clearly would be the ideal solution ;).

So any ideas where this error comes from?


Solution

  • You have two problems. One is that quad returns a tuple with the value and an estimate of the error, and the other is in how you are vectorizing. You don't want to vectorize on the vectors parameter. np.vectorize has a for loop, so there is no performance gain from doing it yourself:

    def func(x, p):
        """ Convolution of gaussian and rectangle is a gaussian integral.
            Parameters: A, mu, sigma, a"""
        A, mu, sigma, a = p
        return quad(lambda t: gaus(x-t,A,mu,sigma),-a,a)[0]
    
    def vfunc(x, *p):
        evaluations = numpy.array([func(i, p) for i in x])
        return evaluations
    

    Note that I have taken the * in func away, but not from gaus. Also, I am selecting the first output of quad.

    Whereas this solves your problem, to fit a convolution you may consider going to Fourier space. The Fourier transform of a convolution is the product of the transforms of the functions, and this is going to simplify your life a lot. Furthermore, once in Fourier space you may consider applying a low-pass filter to reduce noise. 210 data points are high enough to get good results.

    Also, if you need more powerful algorithms, you should consider iminuit, using ROOT's long proven Minuit.