Search code examples
pythontime-seriesarimap-value

Why am I getting p-value 0.00000 in adfuller test?


I am working with ARIMA. To make the data stationary I have transformed the data using log and then subtracted the values by using shift. When I tested again with a rolling mean and adfuller test. I am getting p-value to be 0.0000 why is it so?

My code:

import numpy as np 
import pandas as pd 
from statsmodels.tsa.stattools import adfuller
import matplotlib.pyplot as plt
df =
Date        open         high         low        close       adjclose    Volume
2010-06-30  5.158000    6.084000    4.660000    4.766000    4.766000    85935500
2010-07-01  5.000000    5.184000    4.054000    4.392000    4.392000    41094000
df['Date']=pd.to_datetime(df['Date'], infer_datetime_format=True)
df=df.set_index(['Date'])
def test_ad(values):
    mvm = values.rolling(window=12).mean()
    mvstd = values.rolling(window=12).std()
    orig = plt.plot(values,color='blue',label='org')
    mean = plt.plot(mvm,color='red',label='mvm')
    std=plt.plot(mvstd,color='black',label='mvstd')
    plt.legend(loc='best')
    plt.show(block=False)
    result=adfuller(values)
    print('ADF Statistic: %f' % result[0])
    print('p-value: %f' % result[1])
    print('Critical Values:')
    #labels = ['ADF Test Statistic','p-value','#Lags Used','Number of Observations Used']
    for key, value in result[4].items():
        print('\t%s: %.3f' % (key, value))
    if result[1] <= 0.05:
        print("Data is stationary")
    else:
        print("non-stationary ")

test_ad(df['Close'])

which gives:

ADF Statistic: 6.450459
p-value: 1.000000
Critical Values:
    1%: -3.433
    5%: -2.863
    10%: -2.567


df['log']=np.log(df["Close"])
df['close']=df['log']-df['log'].shift()
#df['close']=df['log'].diff()
test_ad(df['close'].dropna())

Which gives

ADF Statistic: -50.361617
    p-value: 0.000000
    Critical Values:
        1%: -3.433
        5%: -2.863
        10%: -2.567

The graph looks stationary and also the critical values got satisfied as you can see above.


Solution

  • You can see yourself that your ADF statistic is MUCH less than the critical value for 1%, therefore your p is just extremely small.

    What makes it confusing is that you are using %f to print out this value, which by default (i.e. without specifying the precision such as %.2f to include 2 decimals or %.10f to include 10 decimals) only includes 6 decimals after the point.

    If you were to print the values in their entirety (such as print('p-value: %s' % result[1]) where you treat your p-value as a string (thus no need to specify precision), or in an f-string print(f'p-value: {result[1]}')), you would see that you p-value is actually above 0 (although still very small).