I was using linear_harveu_collier to test whether the model is linear. But I don't understand what causes this error.
import statsmodels.formula.api as smf
import statsmodels.stats.diagnostic as sms
QGDP = df['QUARTERLY REAL GDP']
UNRATE = df['UNRATE(%)']
ols_1 = smf.ols(formula='QGDP ~ UNRATE', data = df)
ols_fit = ols_1.fit()
test = sms.linear_harvey_collier(ols_fit)
ValueError: "The initial regressor matrix, x[:skip], issingular. You must use a value of skip large enough to ensure that the first OLS estimator is well-defined.
This test performs its own regression, and the error tells you that the regression cannot be performed as the regressor matrix is non-invertible.
Since your model only has one regressor, plus the constant term (as you are using a patsy string formula), by using the default None
value for the skip=
parameter of the test function, you are telling the test to use a 2 by 2 matrix for regression (2 vars and 2 observations matching the number of vars). That matrix is non-invertible because of the values it contains.
For example, if the first two values of the UNRATE
column are both 0's, then the matrix (with the constant term) becomes
a = [[1,0]
[1,0]]
and np.linalg.inv(np.dot(a.T, a))
is undefined.
So, as the message instructs, you need to specify a large enough value of the skip=
parameter (i.e. how many observations to use for the initial model of the test) to ensure a sufficient rank of the regressor matrix (it needs to be at least as large as the number of columns in the matrix).