Search code examples
pythonscipycurve-fittingleast-squarestrend

Minimize squared error for fixed parameter regression


I'm currently comparing several traditional methods for selecting an exponential trend in a series of points. Trends are often selected in my industry without concern for measures of fit, and this means that a common method is to simply measure yr/yr change and average these results over a period of time. This produces a factor, but doesn't produce a fit, so it's difficult to compare it to an exponential regression or other approach. So my question:

If I have a pre-selected, fixed trend factor for an exponential curve, is there a simple method for optimizing the 'intercept' value which would minimize the squared error of the overall fit to a set of data? Consider the following:

import numpy as np
from sklearn.metrics import r2_score
from scipy.optimize import curve_fit

#Define exponential function
def ex(x,a,b):
    return a*b**x

#Seed data with normally distributed error
x=np.linspace(1,20,20)
np.random.seed(100)
y=ex(x,100,1.01)+3*np.random.randn(20)

#Exponential regression for trend value, fitted values, and r_sq measure
popt, pcov = curve_fit(ex, x, y)
trend,fit,r_sq=(popt[1])-1, ex(x,*popt), r2_score(y,ex(x,*popt))

#Mean Yr/Yr change as an alternative measure of trend
trend_yryr=np.mean(np.diff(y)/y[1:])

print(trend)
print(trend_yryr)

The mean year/year change produces a different trend value for the data, and I'd like to compare it to the exponential regression's selected trend. My goal is to find the intercept which would minimize squared error for this alternative trend value over the data, and measure that squared error for comparison. Thanks for the help.


Solution

  • One way of fixing a parameter when using curve_fit is to pass a lambda function that hardcodes the parameters you want to fix. Here's how I did it:

    # ... all your preamble, but importing matplotlib
    new_b = trend_yryr + 1
    popt2, pcov2 = curve_fit(lambda x, a: ex(x, a, new_b), x, y)
    fit2, r_sq2 = ex(x, *popt2, new_b), r2_score(y, ex(x, *popt2, new_b))
    

    popt is array([100.24989416, 1.00975864]) and popt2 is array([99.09513612]). Plotting the results gives me this:

    enter image description here

    You could use lmfit to do this perhaps a bit more elegantly, but it's essentially up to you.

    from lmfit import Parameters, minimize
    
    def residual(params, x, y):
        a = params['a']
        b = params['b']
        model = ex(x, a, b)
        return model - y
    
    p1 = Parameters()
    p1.add('a', value=1, vary=True)
    p1.add('b', value=1, vary=True)
    
    p2 = Parameters()
    p2.add('a', value=1, vary=True)
    p2.add('b', value=np.mean(np.diff(y)/y[1:]) + 1, vary=False) # important
    
    out1 = minimize(residual, p1, args=(x, y))
    out2 = minimize(residual, p2, args=(x, y))
    

    The outputs of both methods are essentially the same so I won't post them again