What is the method of regression used in adfuller()
? I'm performing an augmented dickey fuller test on a time series, and I'm trying two different ways of doing it.
First, I use pandas.diff()
to get the change in price dy
. Then I'm passing the original time series as an independent variable y
along with dy
as the dependent into statsmodels.OLS(dy,y)
and getting the results. Then, I extract the slope parameter, model.params[1]
and the standard error of the slope parameter model.bse[1]
. The quotient of these terms is the Dickey Fuller test statistic I call DF = model.params[1]/model.bse[1]
.
Second, I pass the singular time price series into adfuller()
as such:
adfstat, pvalue, critvalues, resstore = ts.adfuller(y.y,regression='c',store=True,regresults=True)
Now, to get the Dickey Fuller test statistic, I simply pass
DF = resstore.tvalues[1]
Using OLS I get:
DF = -1.81495580198
With adfuller():
DF = -1.56386414181
I'm wondering what is the difference between these two methods? Does adfuller() perform a different linear regression than OLS internally? I've observed that the results from OLS are undeniably correct according to a book that I'm getting examples from. But I prefer to use adfuller() because it provides the critical values for the test statistic as a part of the output. Additionally, it seems that there are many regression coefficients for the adfuller() result:
print resstore.resols.params ==>
[-0.00491391 0.02366782 -0.00295179 0.01354619 0.06399901 -0.06018851
-0.00328142 -0.03876784 0.02934003 -0.10224276 0.00227549 0.01042279
-0.04627873 0.05503934 -0.02707106 0.02664511 -0.02428741 0.04894767
-0.06206492 0.00508655]
I determine the halflife for mean reversion by getting the slope of the regression line. It looks here that adfuller()
is computing a 20th order regression? This doesn't seem right. Maybe I'm doing this wrong though? Can somebody shed some light on adfuller()
?
This can be solved by setting the maxlag=1 in the input for adfuller()