I am working with ARIMA. To make the data stationary I have transformed the data using log and then subtracted the values by using shift. When I tested again with a rolling mean and adfuller test. I am getting p-value to be 0.0000 why is it so?
My code:
import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import adfuller
import matplotlib.pyplot as plt
df =
Date open high low close adjclose Volume
2010-06-30 5.158000 6.084000 4.660000 4.766000 4.766000 85935500
2010-07-01 5.000000 5.184000 4.054000 4.392000 4.392000 41094000
df['Date']=pd.to_datetime(df['Date'], infer_datetime_format=True)
df=df.set_index(['Date'])
def test_ad(values):
mvm = values.rolling(window=12).mean()
mvstd = values.rolling(window=12).std()
orig = plt.plot(values,color='blue',label='org')
mean = plt.plot(mvm,color='red',label='mvm')
std=plt.plot(mvstd,color='black',label='mvstd')
plt.legend(loc='best')
plt.show(block=False)
result=adfuller(values)
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
#labels = ['ADF Test Statistic','p-value','#Lags Used','Number of Observations Used']
for key, value in result[4].items():
print('\t%s: %.3f' % (key, value))
if result[1] <= 0.05:
print("Data is stationary")
else:
print("non-stationary ")
test_ad(df['Close'])
which gives:
ADF Statistic: 6.450459
p-value: 1.000000
Critical Values:
1%: -3.433
5%: -2.863
10%: -2.567
df['log']=np.log(df["Close"])
df['close']=df['log']-df['log'].shift()
#df['close']=df['log'].diff()
test_ad(df['close'].dropna())
Which gives
ADF Statistic: -50.361617
p-value: 0.000000
Critical Values:
1%: -3.433
5%: -2.863
10%: -2.567
The graph looks stationary and also the critical values got satisfied as you can see above.
You can see yourself that your ADF statistic is MUCH less than the critical value for 1%, therefore your p is just extremely small.
What makes it confusing is that you are using %f
to print out this value, which by default (i.e. without specifying the precision such as %.2f
to include 2 decimals or %.10f
to include 10 decimals) only includes 6 decimals after the point.
If you were to print the values in their entirety (such as print('p-value: %s' % result[1])
where you treat your p-value as a string (thus no need to specify precision), or in an f-string print(f'p-value: {result[1]}')
), you would see that you p-value is actually above 0 (although still very small).