Search code examples
pythonlinear-regression

How to compute AIC for linear regression model in Python?


I want to compute AIC for linear models to compare their complexity. I did it as follows:

regr = linear_model.LinearRegression()
regr.fit(X, y)

aic_intercept_slope = aic(y, regr.coef_[0] * X.as_matrix() + regr.intercept_, k=1)

def aic(y, y_pred, k):
   resid = y - y_pred.ravel()
   sse = sum(resid ** 2)

   AIC = 2*k - 2*np.log(sse)

return AIC

But I receive a divide by zero encountered in log error.


Solution

  • sklearn's LinearRegression is good for prediction but pretty barebones as you've discovered. (It's often said that sklearn stays away from all things statistical inference.)

    statsmodels.regression.linear_model.OLS has a property attribute AIC and a number of other pre-canned attributes.

    However, note that you'll need to manually add a unit vector to your X matrix to include an intercept in your model.

    from statsmodels.regression.linear_model import OLS
    from statsmodels.tools import add_constant
    
    regr = OLS(y, add_constant(X)).fit()
    print(regr.aic)
    

    Source is here if you are looking for an alternative way to write manually while still using sklearn.