Search code examples
pythonstatisticstime-seriesstatsmodelsarima

How can I reproduce the statsmodels ARIMA filter?


I am attempting to reproduce the filters used in an ARIMA model using stastmodels recursive_filter and convolution_filter. (My end objective is to use these filters for prewhitening an exogenous series.)

I begin by working with just an AR model and recursive filter. Here is the simplified experimental setup:

import numpy as np
import statsmodels as sm

np.random.seed(42)

# sample data
series = sm.tsa.arima_process.arma_generate_sample(ar=(1,-0.2,-0.5), ma=(1,), nsample=100)

model = sm.tsa.arima.model.ARIMA(series, order=(2,0,0)).fit()
print(model.summary())

Which gracefully produces the following, which seems fair enough:

                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                  100
Model:                 ARIMA(2, 0, 0)   Log Likelihood                -131.991
Date:                Wed, 07 Apr 2021   AIC                            271.982
Time:                        12:58:39   BIC                            282.403
Sample:                             0   HQIC                           276.200
                                - 100                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.3136      0.266     -1.179      0.238      -0.835       0.208
ar.L1          0.2135      0.084      2.550      0.011       0.049       0.378
ar.L2          0.4467      0.101      4.427      0.000       0.249       0.645
sigma2         0.8154      0.126      6.482      0.000       0.569       1.062
===================================================================================
Ljung-Box (L1) (Q):                   0.10   Jarque-Bera (JB):                 0.53
Prob(Q):                              0.75   Prob(JB):                         0.77
Heteroskedasticity (H):               0.98   Skew:                            -0.16
Prob(H) (two-sided):                  0.96   Kurtosis:                         2.85
===================================================================================

I fit an AR(2) and obtain coefficients for lag 1 and 2 based on the SARIMAX results. My intuition for reproducing this model using statsmodels.tsa.filters.filtertools.recursive_filter is like so:

filtered = sm.tsa.filters.filtertools.recursive_filter(series, ar_coeff=(-0.2135, -0.4467))

(And maybe adding in the constant from the regression results as well). However, a direct comparison of the results shows the recursive filter doesn't replicate the AR model:

import matploylib.pyplot as plt

# ARIMA residuals
plt.plot(model.resid)

# Calculated residuals using recursive filter outcome
plt.plot(filtered)

Am I approaching this incorrectly? Should I be using a different filter function? The next step for me is to perform the same task but on an MA model so that I can add (?) the results together to obtain a full ARMA filter for prewhitening.

Note: this question may be valuable to somebody searching for "how can I prewhiten timeseries data?" particularly in Python using statsmodels.


Solution

  • I guess you should use convolution_filter for the AR part and recursive_filter for the MA part. Combining these sequentially will work for ARMA models. Alternatively, you can use arma_innovations for an exact approach that works with both AR and MA parts simultaneously. Here are some examples:

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    from statsmodels.tsa.innovations import arma_innovations
    

    AR(2)

    np.random.seed(42)
    series = sm.tsa.arma_generate_sample(ar=(1, -0.2, -0.5), ma=(1,), nsample=100)
    
    res = sm.tsa.arima.ARIMA(series, order=(2, 0, 0), trend='n').fit()
    print(pd.DataFrame({
        'ARIMA resid': res.resid,
        'arma_innovations': arma_innovations.arma_innovations(
            series, ar_params=res.params[:-1])[0],
        'convolution filter': sm.tsa.filters.convolution_filter(
            series, np.r_[1, -res.params[:-1]], nsides=1)}))
    

    gives:

        ARIMA resid  arma_innovations  convolution filter
    0      0.496714          0.496714                 NaN
    1     -0.254235         -0.254235                 NaN
    2      0.666326          0.666326            0.666326
    3      1.493315          1.493315            1.493315
    4     -0.256708         -0.256708           -0.256708
    ..          ...               ...                 ...
    95    -1.438670         -1.438670           -1.438670
    96     0.323470          0.323470            0.323470
    97     0.218243          0.218243            0.218243
    98     0.012264          0.012264            0.012264
    99    -0.245584         -0.245584           -0.245584
    

    MA(1)

    np.random.seed(42)
    series = sm.tsa.arma_generate_sample(ar=(1,), ma=(1, 0.2), nsample=100)
    
    res = sm.tsa.arima.ARIMA(series, order=(0, 0, 1), trend='n').fit()
    print(pd.DataFrame({
        'ARIMA resid': res.resid,
        'arma_innovations': arma_innovations.arma_innovations(
            series, ma_params=res.params[:-1])[0],
        'convolution filter': sm.tsa.filters.recursive_filter(series, -res.params[:-1])}))
    

    gives:

        ARIMA resid  arma_innovations    recursive filter
    0      0.496714          0.496714            0.496714
    1     -0.132893         -0.132893           -0.136521
    2      0.646110          0.646110            0.646861
    3      1.525620          1.525620            1.525466
    4     -0.229316         -0.229316           -0.229286
    ..          ...               ...                 ...
    95    -1.464786         -1.464786           -1.464786
    96     0.291233          0.291233            0.291233
    97     0.263055          0.263055            0.263055
    98     0.005637          0.005637            0.005637
    99    -0.234672         -0.234672           -0.234672
    

    ARMA(1, 1)

    np.random.seed(42)
    series = sm.tsa.arma_generate_sample(ar=(1, 0.5), ma=(1, 0.2), nsample=100)
    
    res = sm.tsa.arima.ARIMA(series, order=(1, 0, 1), trend='n').fit()
    a = res.resid
    
    # Apply the recursive then convolution filter
    tmp = sm.tsa.filters.recursive_filter(series, -res.params[1:2])
    filtered = sm.tsa.filters.convolution_filter(tmp, np.r_[1, -res.params[:1]], nsides=1)
    
    print(pd.DataFrame({
        'ARIMA resid': res.resid,
        'arma_innovations': arma_innovations.arma_innovations(
            series, ar_params=res.params[:1], ma_params=res.params[1:2])[0],
        'combined filters': filtered}))
    

    gives:

        ARIMA resid  arma_innovations    combined filters
    0      0.496714          0.496714                 NaN
    1     -0.134253         -0.134253           -0.136915
    2      0.668094          0.668094            0.668246
    3      1.507288          1.507288            1.507279
    4     -0.193560         -0.193560           -0.193559
    ..          ...               ...                 ...
    95    -1.448784         -1.448784           -1.448784
    96     0.268421          0.268421            0.268421
    97     0.212966          0.212966            0.212966
    98     0.046281          0.046281            0.046281
    99    -0.244725         -0.244725           -0.244725
    

    SARIMA(1, 0, 1)x(1, 0, 0, 3)

    The seasonal model is a little more complicated, because it requires multiplying lag polynomials. For additional details, see this example notebook from the Statsmodels documentation.

    np.random.seed(42)
    ar_poly = [1, -0.5]
    sar_poly = [1, 0, 0, -0.1]
    ar = np.polymul(ar_poly, sar_poly)
    series = sm.tsa.arma_generate_sample(ar=ar, ma=(1, 0.2), nsample=100)
    
    res = sm.tsa.arima.ARIMA(series, order=(1, 0, 1), seasonal_order=(1, 0, 0, 3), trend='n').fit()
    a = res.resid
    
    # Apply the recursive then convolution filter
    tmp = sm.tsa.filters.recursive_filter(series, -res.polynomial_reduced_ma[1:])
    filtered = sm.tsa.filters.convolution_filter(tmp, res.polynomial_reduced_ar, nsides=1)
    
    print(pd.DataFrame({
        'ARIMA resid': res.resid,
        'arma_innovations': arma_innovations.arma_innovations(
            series, ar_params=-res.polynomial_reduced_ar[1:],
            ma_params=res.polynomial_reduced_ma[1:])[0],
        'combined filters': filtered}))
    

    gives:

        ARIMA resid  arma_innovations combined filters
    0      0.496714          0.496714              NaN
    1     -0.100303         -0.100303              NaN
    2      0.625066          0.625066              NaN
    3      1.557418          1.557418              NaN
    4     -0.209256         -0.209256        -0.205201
    ..          ...               ...              ...
    95    -1.476702         -1.476702        -1.476702
    96     0.269118          0.269118         0.269118
    97     0.230697          0.230697         0.230697
    98    -0.004561         -0.004561        -0.004561
    99    -0.233466         -0.233466        -0.233466