I tried to use ARIMA model on a time-series dataset(stock sp-500).
Before input data to ARIMA model, I wanted to know if the the time-series has stationarity.
So,I choose the stock whose ticker is "APA"(Apache Corporation), I used the adfuller
from package statsmodels.tsa.stattools
to test if time-series has stationarity.
I also used ndiff
from package pmdarima.arima
to find the suitable diff number for ARIMA model(to my understanding, set this number on ARIMA model would make the time-series has stationarity).
And the p-value of adfuller
is greater than 0.05, so I supposed the time-series has no stationarity (I find the conclusion in here: How to interpret adfuller test results?)
But the result of ndiff
is 0
.
To my understanding, this is a lit bit weird, because adfuller
shows that the time-series has no stationarity, and ndiff
shows that no need to set ARIMA differencing term.
My question is: Shouldn't the result of ndiff
be greater than 0
if the time-series is not stationary?
dataset: https://www.kaggle.com/hanseopark/prediction-of-price-for-ml-with-finance-stats/data
complete codes: https://gist.github.com/bab6426c0e8a10472c924755c1f5ff67.git
The funtcions from pmdarima
are great but not infallible. Additionally, it really depends on your data. Differencing is a great way to make data stationary, but sometimes it does not work. It is usually used to remove trends in the data, and seasonal differencing is used to remove seasonality.
Stock prices or indices like the S&P are not seasonal and even trends are hard to detect or to quantify. Instead, such time series often have a lot of irregularities, ups and downs etc., in that case you might need to apply a logarithm (or a combination, or something else...) to make the data stationary and such things can't always be detected by pmdarima
or even the ADF test. They are great tools, but you cannot fully rely on them.
The logarithm solution would be something like this:
your_dataframe["your_stationary_feature"] = numpy.log(your_dataframe["your_feature"])
If you predict logarithmic data, then it needs to be inverted:
inverted_data = numpy.epx(your_predictions)