Here I have defined a function to return the sum of an arbitrary number of Gaussian Distributions:
import numpy
from numpy import *
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def GaussSum(x,*p):
n=len(p)/3
A=p[:n]
w=p[n:2*n]
c=p[2*n:3*n]
return sum([ A[i]*exp(-(x-c[i])**2./(2.*(w[i])**2.))/(2*pi*w[i]**2)**0.5 for i in range(n)])
I then proceed to generate x and y data for a given set of parameters and ask curve_fit to fit this data with the initial parameters matching the generating set. I tried this for many different sets including both single and multiple Gaussians.
params = [1.,1.,-3.]; #parameters for a single gaussian
#params=[1.,1.,1.,2.,-3.,0.]; #parameters for the sum of two gaussians
xdata=arange(-6,6,0.01)
ydata = array([GaussSum(x,*params) for x in xdata])
popt,pcov = curve_fit(GaussSum,xdata,ydata,p0=params)
print popt
print pcov
Every parameter set gives me a non-fits, even though I already supposedly started the fit at the solution. (In the above single Gaussian):
[ 52.18242366 5549.66965192 15678.51803797]
inf
I know that the function itself operates normally, as I have plotted with it and verified its validity.
The problem here is that curve_fit expects the function you are optimising to take a vector of inputs and return a vector of outputs. As written, the function GaussSum takes a single input or a vector of inputs (i.e. a numpy.array) but either way it returns a single scalar output. The curve_fit function then fails to find a good optimum fit.
In order to clarify what's going on, it's always advisable when using numpy (or any external library) to always use the namespace explicitly, as shown in following working version:
import numpy
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def GaussSum(x,*p):
n=len(p)/3
A=p[:n]
w=p[n:2*n]
c=p[2*n:3*n]
y = sum([ A[i]*numpy.exp(-(x-c[i])**2./(2.*(w[i])**2.))/(2*numpy.pi*w[i]**2)**0.5 for i in range(n)])
return y
params = [1.,1.,-3.]; #parameters for a single gaussian
#params=[1.,1.,1.,2.,-3.,0.]; #parameters for the sum of two gaussians
xdata=numpy.arange(-6,6,0.01)
ydata = numpy.array([GaussSum(x,*params) for x in xdata])
popt,pcov = curve_fit(GaussSum,xdata,ydata,p0=params)
Specifically you were implicitly calling numpy.sum, which would aggregate the numpy.array and return a single value, while you needed to use the builtin python sum, which aggregates the native list of numpy.array to one numpy.array.