I am trying to fit temperature and precipitation data to a periodic function using curve_fit. For some reason, curve_fit does not appear to test the entire parameter space as defined by the bounds parameter. I threw together a little model to demonstrate this.
############################################################
# First generate some data
import numpy as np
# Seed the random number generator for reproducibility
np.random.seed(0)
def test_func(x, b, c, d, e=0.0):
return e*x + b * np.sin(2*np.pi * x/c + d)
def func_err(x,y):
sumxy=0.0
pos=0;
for xtst in x:
sumxy+=np.square(xtst-y[pos])
pos+=1
sumxy/=len(x)
return sumxy
x_data = np.linspace(0, 10, num=50)*2*np.pi
y_data = test_func(x_data,5.0,20,-0.20, 0.1) + 1.0*np.random.normal(size=50)
# And plot it
import matplotlib.pyplot as plt
############################################################
# Now fit a simple sine function to the data
from scipy import optimize
params, params_covariance = optimize.curve_fit(test_func, x_data, y_data,
p0=[1.0, 18, 0.0, 0.0],
bounds=([0.1,5,-5.0, -5.0],[100,100,5.0, 5.0]))
print([params,func_err(y_data,test_func(x_data,params[0],params[1], params[2], params[3]))])
############################################################
# And plot the resulting curve on the data
plt.figure(figsize=(6, 4))
plt.scatter(x_data, y_data, label='Data')
plt.plot(x_data, test_func(x_data, params[0], params[1],params[2], params[3]),
label='Fitted function')
plt.legend(loc='best')
plt.show()
With the p0=[1.0, 18, 0.0, 0.0]
as given, the rotuine finds a solution just fine,
but with initial values like
p0=[1.0, 10, 0.0, 0.0]
it fails pretty dramatically.
Why does the routine not cover the range given by the bounds to find its solution?
I think this is due to the nature of the periodic function. Your parameter c
determine the periodicity of the function. When your initial guess of periodicity is far from the right periodicity, the fitting will stuck at a local minimum.
You can think it as if when p0=[1.0, 10, 0.0, 0.0]
, the fitting algorithm finds a local best fit as you shown in the second figure, which is [ 0.65476428, 11.14188385, -1.09652992, 0.08971854]
for [b,c,d,e]
, it tries to move the parameters by a little around that, but the gradient around it suggest this is the best fit, as if it is in a "valley" of the parmeter space, so it stops iteration at there.
The curve_fit
is not going to explore your entire parameter space: it only starts from your initial guess, p0
in this case, and use heuristic methods to find local optima.
If you want to explore the entire parameter space for parameter c
, you can implement a simple grid search. For example, you can try all values in between the bounds for c
, and do curve_fit
for each c
value, then pick the one with lest fitting errors.
Here is an example code:
def MSE(params,x_data,y_data):
"to calclate mean square error, this is the same as your func_error"
return ((test_func(x_data,*params)-y_data)**2).mean()
besterror = 10000
bestParam = None
for c_ in np.arange(5,100,1):
# grid search for parameter c between 5 and 100, step size is 1.
params, params_covariance = optimize.curve_fit(test_func, x_data, y_data,
p0=[1.0, c_+0.5, 0.0, 0.0],
bounds=([0.1,c_,-5.0, -5.0],[100,c_+1,5.0, 5.0]))
error = MSE(params,x_data,y_data)
if error<besterror:
besterror = error
bestParam = params
params = bestParam
print([params,func_err(y_data,test_func(x_data,params[0],params[1], params[2], params[3]))])
############################################################
# And plot the resulting curve on the data
params_covariance
plt.figure(figsize=(6, 4))
plt.scatter(x_data, y_data, label='Data')
plt.plot(x_data, test_func(x_data, params[0], params[1],params[2], params[3]),
label='Fitted function')
plt.legend(loc='best')
plt.show()
Grid search for other parameters is unnecessary in this case, because curve_fit
is good enough at finding optimal values for other parameters.
This is kind of a brute force method, there might be libraries that can help you do this in a more efficient way.