Search code examples
pythonstatisticstime-seriesstatsmodelsarima

The result of ADF test doesn't match to ndiff of arima


I tried to use ARIMA model on a time-series dataset(stock sp-500).

Before input data to ARIMA model, I wanted to know if the the time-series has stationarity.

So,I choose the stock whose ticker is "APA"(Apache Corporation), I used the adfuller from package statsmodels.tsa.stattools to test if time-series has stationarity.

I also used ndiff from package pmdarima.arima to find the suitable diff number for ARIMA model(to my understanding, set this number on ARIMA model would make the time-series has stationarity).

And the p-value of adfuller is greater than 0.05, so I supposed the time-series has no stationarity (I find the conclusion in here: How to interpret adfuller test results?)

But the result of ndiff is 0.

To my understanding, this is a lit bit weird, because adfuller shows that the time-series has no stationarity, and ndiff shows that no need to set ARIMA differencing term.

My question is: Shouldn't the result of ndiff be greater than 0 if the time-series is not stationary?

test codes: enter image description here

dataset: https://www.kaggle.com/hanseopark/prediction-of-price-for-ml-with-finance-stats/data

complete codes: https://gist.github.com/bab6426c0e8a10472c924755c1f5ff67.git


Solution

  • The funtcions from pmdarima are great but not infallible. Additionally, it really depends on your data. Differencing is a great way to make data stationary, but sometimes it does not work. It is usually used to remove trends in the data, and seasonal differencing is used to remove seasonality.

    Stock prices or indices like the S&P are not seasonal and even trends are hard to detect or to quantify. Instead, such time series often have a lot of irregularities, ups and downs etc., in that case you might need to apply a logarithm (or a combination, or something else...) to make the data stationary and such things can't always be detected by pmdarima or even the ADF test. They are great tools, but you cannot fully rely on them.

    The logarithm solution would be something like this:

    your_dataframe["your_stationary_feature"] = numpy.log(your_dataframe["your_feature"])
    

    If you predict logarithmic data, then it needs to be inverted:

    inverted_data = numpy.epx(your_predictions)