I'm currently comparing several traditional methods for selecting an exponential trend in a series of points. Trends are often selected in my industry without concern for measures of fit, and this means that a common method is to simply measure yr/yr change and average these results over a period of time. This produces a factor, but doesn't produce a fit, so it's difficult to compare it to an exponential regression or other approach. So my question:
If I have a pre-selected, fixed trend factor for an exponential curve, is there a simple method for optimizing the 'intercept' value which would minimize the squared error of the overall fit to a set of data? Consider the following:
import numpy as np
from sklearn.metrics import r2_score
from scipy.optimize import curve_fit
#Define exponential function
def ex(x,a,b):
return a*b**x
#Seed data with normally distributed error
x=np.linspace(1,20,20)
np.random.seed(100)
y=ex(x,100,1.01)+3*np.random.randn(20)
#Exponential regression for trend value, fitted values, and r_sq measure
popt, pcov = curve_fit(ex, x, y)
trend,fit,r_sq=(popt[1])-1, ex(x,*popt), r2_score(y,ex(x,*popt))
#Mean Yr/Yr change as an alternative measure of trend
trend_yryr=np.mean(np.diff(y)/y[1:])
print(trend)
print(trend_yryr)
The mean year/year change produces a different trend value for the data, and I'd like to compare it to the exponential regression's selected trend. My goal is to find the intercept which would minimize squared error for this alternative trend value over the data, and measure that squared error for comparison. Thanks for the help.
One way of fixing a parameter when using curve_fit
is to pass a lambda function that hardcodes the parameters you want to fix. Here's how I did it:
# ... all your preamble, but importing matplotlib
new_b = trend_yryr + 1
popt2, pcov2 = curve_fit(lambda x, a: ex(x, a, new_b), x, y)
fit2, r_sq2 = ex(x, *popt2, new_b), r2_score(y, ex(x, *popt2, new_b))
popt
is array([100.24989416, 1.00975864])
and popt2
is array([99.09513612])
. Plotting the results gives me this:
You could use lmfit
to do this perhaps a bit more elegantly, but it's essentially up to you.
from lmfit import Parameters, minimize
def residual(params, x, y):
a = params['a']
b = params['b']
model = ex(x, a, b)
return model - y
p1 = Parameters()
p1.add('a', value=1, vary=True)
p1.add('b', value=1, vary=True)
p2 = Parameters()
p2.add('a', value=1, vary=True)
p2.add('b', value=np.mean(np.diff(y)/y[1:]) + 1, vary=False) # important
out1 = minimize(residual, p1, args=(x, y))
out2 = minimize(residual, p2, args=(x, y))
The outputs of both methods are essentially the same so I won't post them again