Search code examples
pythonscipyleast-squares

Python lmfit - how come the intercept's estimation in a linear regression depends on the starting value?


I'm trying to figure out how come the intercept estimation using the lmfit package yields different results, depending on the starting value. I'm comparing the estimation results to the ones of a standard OLS estimation, using statsmodels. Can anyone assist?

Code:

from lmfit import minimize, Parameters
import numpy as np
import statsmodels.api as sm

x = np.linspace(0, 15, 10)
x_ols = sm.add_constant(x)
y = range(0,10)

model = sm.OLS(y,x_ols)
results = model.fit()
print "OLS: ", format(results.params[0], '.10f'), format(results.params[1], '.10f')


# define objective function: returns the array to be minimized
def fcn2min(params, x, data):
    a = params['a'].value
    b = params['b'].value

    model = a + b * x
    return model - data

for i in range(-2,3):
    # create a set of Parameters
    params = Parameters()
    params.add('a', value= i)
    params.add('b', value= 20)

    # do fit, here with leastsq model
    result = minimize(fcn2min, params, args=(x, y))
    # print "lmfit: ",result.values # older usage
    print "lmfit: ",result.params.values # newer syntax 

Solution

  • If you look at the results, then you see that the constant is close to zero at approximately the default tolerance of scipy.optimize.leastsq.

    OLS:  -0.0000000000 0.6000000000
    lmfit:  {'a': 1.1967327242889913e-08, 'b': 0.59999999932429748}
    lmfit:  {'a': 3.2643846243929039e-08, 'b': 0.59999999717671448}
    lmfit:  {'a': 3.2644232427278463e-08, 'b': 0.59999999717679642}
    lmfit:  {'a': 1.636450187584131e-08, 'b': 0.59999999900662315}
    lmfit:  {'a': 3.3578611617157454e-08, 'b': 0.59999999693286854}
    

    If I make the required tolerance tighter, then the estimate of the constant is closer to zero, and both estimated coefficient are closer to the OLS solution.

    After adding xtol to your code

    result = minimize(fcn2min, params, args=(x, y), xtol=1e-12)
    

    I get:

    OLS:  -0.0000000000 0.6000000000
    lmfit:  {'a': -3.5437341988915725e-32, 'b': 0.59999999999999998}
    lmfit:  {'a': -1.8490806236697864e-32, 'b': 0.59999999999999998}
    lmfit:  {'a': -9.8614500814838325e-32, 'b': 0.59999999999999998}
    lmfit:  {'a': 8.328833474508222e-09, 'b': 0.59999999921839664}
    lmfit:  {'a': -5.8547746020524404e-32, 'b': 0.59999999999999998}
    

    OLS uses linear algebra to get an explicit solution (based on generalized inverse, pinv). Numerical precision is high because of this.

    The nonlinear optimization uses an iterative solution and stops when it is within the required tolerances.

    There is still one case in the example that is only correct at 8e-9, but this can also be because of the numerical approximation to the gradient, and other factors that influence the convergence checks.