Search code examples
pythonstatisticsregressionlinear-regressionstatsmodels

Statsmodels linear_harveu_collier test - ValueError


I was using linear_harveu_collier to test whether the model is linear. But I don't understand what causes this error.

import statsmodels.formula.api as smf
import statsmodels.stats.diagnostic as sms    

QGDP = df['QUARTERLY REAL GDP']
UNRATE = df['UNRATE(%)']
ols_1 = smf.ols(formula='QGDP ~ UNRATE', data = df)
ols_fit = ols_1.fit()
test = sms.linear_harvey_collier(ols_fit)

ValueError: "The initial regressor matrix, x[:skip], issingular. You must use a value of skip large enough to ensure that the first OLS estimator is well-defined.


Solution

  • This test performs its own regression, and the error tells you that the regression cannot be performed as the regressor matrix is non-invertible.

    Since your model only has one regressor, plus the constant term (as you are using a patsy string formula), by using the default None value for the skip= parameter of the test function, you are telling the test to use a 2 by 2 matrix for regression (2 vars and 2 observations matching the number of vars). That matrix is non-invertible because of the values it contains.

    For example, if the first two values of the UNRATE column are both 0's, then the matrix (with the constant term) becomes

    a = [[1,0]
         [1,0]]
    

    and np.linalg.inv(np.dot(a.T, a)) is undefined.


    So, as the message instructs, you need to specify a large enough value of the skip= parameter (i.e. how many observations to use for the initial model of the test) to ensure a sufficient rank of the regressor matrix (it needs to be at least as large as the number of columns in the matrix).