Search code examples
pythonstatsmodels

Problems with ACF plots and operands


This piece of code used to run just fine, but for unknown reasons it doesn't anymore. It somehow has a problem with plotting the whole second acf plot.

import numpy as np, pandas as pd
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
plt.rcParams.update({'figure.figsize':(9,7), 'figure.dpi':120})

# Import data
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/wwwusage.csv', 
             names=['value'], header=0)

# Original Series
fig, axes = plt.subplots(2, 2, sharex=True)
axes[0, 0].plot(df.value); axes[0, 0].set_title('Original Series')
plot_acf(df.value, ax=axes[0, 1], lags = np.arange(len(df)))

# 1st Differencing
axes[1, 0].plot(df.value.diff()); axes[1, 0].set_title('1st Order Differencing')
plot_acf(df.value.diff().dropna(), ax=axes[1, 1], lags = np.arange(len(df)))
plt.show()

Output: ValueError: operands could not be broadcast together with shapes (98,) (97,) (98,) 

Does anyone know how to interpret this output? I suppose that the dropna function creates a problem, but I am unsure if this is the source of the problem. As I said, It used to work just fine, the data is still the same. I have updated my statsmodels library, but I doubt that this is the reason for my problem. Thank you in advance


Solution

  • The issue is that when plotting the ACF of the differenced time series (which has 99 observations) you are setting the number of lags equal to the number of observations in the original time series (which has 100 observations), i.e. the number of lags is greater than the number of observations.

    To resolve the issue you need to replace lags = np.arange(len(df)) with lags = np.arange(len(df) - 1) in the second ACF plot.

    Note that when you calculate the first differences of a time series you lose one observation, which is set to NaN (specifically, the first observation is set to NaN). Therefore, after removing the missing values with dropna(), you have one less observation.

    import numpy as np, pandas as pd
    from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
    import matplotlib.pyplot as plt
    plt.rcParams.update({'figure.figsize':(9,7), 'figure.dpi':120})
    
    # Import data
    df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/wwwusage.csv', names=['value'], header=0)
    
    # Original Series
    fig, axes = plt.subplots(2, 2, sharex=True)
    axes[0, 0].plot(df.value); axes[0, 0].set_title('Original Series')
    plot_acf(df.value, ax=axes[0, 1], lags=np.arange(len(df)))
    
    # 1st Differencing
    axes[1, 0].plot(df.value.diff()); axes[1, 0].set_title('1st Order Differencing')
    plot_acf(df.value.diff().dropna(), ax=axes[1, 1], lags=np.arange(len(df) - 1))
    plt.show()
    
    

    enter image description here