Search code examples
python-3.xtime-seriesstatsmodelsforecastingarima

Problem with number of lags in statsmodels acf plot and pacf plot


I am testing some codes from online tutorials and i have problems reproducing the results regarding 'statsmodels' and 'plot_acf' and 'plot_pacf'.

For exemple for this exemple . Using exactly the same code i obtain this

Another exemple . Using the same code i obtain this

Its always a maximum of 20. Is it a default value for a parameter ?

Both codes not specify any other parameter exept the time-series values : plot_acf(series)

When i try to specify a number of lags, it works to a certain value, if i increase lags beyond a certain value i have the error:

"Can only compute partial correlations for lags up to 50% of the sample size."

Can anyone explain to me how can i manage to reproduce the same results.

I am using statsmodels version: 0.12.2

The code is simple:

from pandas import read_csv
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
from matplotlib import pyplot
series = read_csv('stationary.csv', header=None, index_col=0, parse_dates=True, squeeze=True)
print(series)
pyplot.figure()
pyplot.subplot(211)
plot_acf(series,ax=pyplot.gca())
pyplot.subplot(212)
plot_pacf(series, ax=pyplot.gca())
pyplot.show()

Solution

  • I ran into similar problems and I got this for the "lags" in their documents:

    If not provided, lags=np.arange(len(corr)) is used.

    I have no ideas what this "corr" refers to as I cant find from the doc page (it may refers to the correlations in vertical Y axis?):

    In my case of 36xx rows of data, the default lags give me 1%, ie. 36 and I tried a sample of 2000 rows, the same default 36 is given.

    After read this official Github thread: https://github.com/statsmodels/statsmodels/issues/4663 it looks like the author had made a sensible change in newer version that leads to your scenario.

    In your examples provided, both of your hands-on output indeed infered that those truncated / after the far right data point are *statistically insignificant * (far below the shaded boundary) so you should not worry about getting the exact replicate.

    I also noted that the input parameters/defaults use for statsmodels change slightly over time.