I am attempting to reproduce the filters used in an ARIMA model using stastmodels recursive_filter
and convolution_filter
. (My end objective is to use these filters for prewhitening an exogenous series.)
I begin by working with just an AR model and recursive filter. Here is the simplified experimental setup:
import numpy as np
import statsmodels as sm
np.random.seed(42)
# sample data
series = sm.tsa.arima_process.arma_generate_sample(ar=(1,-0.2,-0.5), ma=(1,), nsample=100)
model = sm.tsa.arima.model.ARIMA(series, order=(2,0,0)).fit()
print(model.summary())
Which gracefully produces the following, which seems fair enough:
SARIMAX Results
==============================================================================
Dep. Variable: y No. Observations: 100
Model: ARIMA(2, 0, 0) Log Likelihood -131.991
Date: Wed, 07 Apr 2021 AIC 271.982
Time: 12:58:39 BIC 282.403
Sample: 0 HQIC 276.200
- 100
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -0.3136 0.266 -1.179 0.238 -0.835 0.208
ar.L1 0.2135 0.084 2.550 0.011 0.049 0.378
ar.L2 0.4467 0.101 4.427 0.000 0.249 0.645
sigma2 0.8154 0.126 6.482 0.000 0.569 1.062
===================================================================================
Ljung-Box (L1) (Q): 0.10 Jarque-Bera (JB): 0.53
Prob(Q): 0.75 Prob(JB): 0.77
Heteroskedasticity (H): 0.98 Skew: -0.16
Prob(H) (two-sided): 0.96 Kurtosis: 2.85
===================================================================================
I fit an AR(2) and obtain coefficients for lag 1 and 2 based on the SARIMAX results. My intuition for reproducing this model using statsmodels.tsa.filters.filtertools.recursive_filter
is like so:
filtered = sm.tsa.filters.filtertools.recursive_filter(series, ar_coeff=(-0.2135, -0.4467))
(And maybe adding in the constant from the regression results as well). However, a direct comparison of the results shows the recursive filter doesn't replicate the AR model:
import matploylib.pyplot as plt
# ARIMA residuals
plt.plot(model.resid)
# Calculated residuals using recursive filter outcome
plt.plot(filtered)
Am I approaching this incorrectly? Should I be using a different filter function? The next step for me is to perform the same task but on an MA model so that I can add (?) the results together to obtain a full ARMA filter for prewhitening.
Note: this question may be valuable to somebody searching for "how can I prewhiten timeseries data?" particularly in Python using statsmodels.
I guess you should use convolution_filter
for the AR part and recursive_filter
for the MA part. Combining these sequentially will work for ARMA models. Alternatively, you can use arma_innovations
for an exact approach that works with both AR and MA parts simultaneously. Here are some examples:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.innovations import arma_innovations
AR(2)
np.random.seed(42)
series = sm.tsa.arma_generate_sample(ar=(1, -0.2, -0.5), ma=(1,), nsample=100)
res = sm.tsa.arima.ARIMA(series, order=(2, 0, 0), trend='n').fit()
print(pd.DataFrame({
'ARIMA resid': res.resid,
'arma_innovations': arma_innovations.arma_innovations(
series, ar_params=res.params[:-1])[0],
'convolution filter': sm.tsa.filters.convolution_filter(
series, np.r_[1, -res.params[:-1]], nsides=1)}))
gives:
ARIMA resid arma_innovations convolution filter
0 0.496714 0.496714 NaN
1 -0.254235 -0.254235 NaN
2 0.666326 0.666326 0.666326
3 1.493315 1.493315 1.493315
4 -0.256708 -0.256708 -0.256708
.. ... ... ...
95 -1.438670 -1.438670 -1.438670
96 0.323470 0.323470 0.323470
97 0.218243 0.218243 0.218243
98 0.012264 0.012264 0.012264
99 -0.245584 -0.245584 -0.245584
MA(1)
np.random.seed(42)
series = sm.tsa.arma_generate_sample(ar=(1,), ma=(1, 0.2), nsample=100)
res = sm.tsa.arima.ARIMA(series, order=(0, 0, 1), trend='n').fit()
print(pd.DataFrame({
'ARIMA resid': res.resid,
'arma_innovations': arma_innovations.arma_innovations(
series, ma_params=res.params[:-1])[0],
'convolution filter': sm.tsa.filters.recursive_filter(series, -res.params[:-1])}))
gives:
ARIMA resid arma_innovations recursive filter
0 0.496714 0.496714 0.496714
1 -0.132893 -0.132893 -0.136521
2 0.646110 0.646110 0.646861
3 1.525620 1.525620 1.525466
4 -0.229316 -0.229316 -0.229286
.. ... ... ...
95 -1.464786 -1.464786 -1.464786
96 0.291233 0.291233 0.291233
97 0.263055 0.263055 0.263055
98 0.005637 0.005637 0.005637
99 -0.234672 -0.234672 -0.234672
ARMA(1, 1)
np.random.seed(42)
series = sm.tsa.arma_generate_sample(ar=(1, 0.5), ma=(1, 0.2), nsample=100)
res = sm.tsa.arima.ARIMA(series, order=(1, 0, 1), trend='n').fit()
a = res.resid
# Apply the recursive then convolution filter
tmp = sm.tsa.filters.recursive_filter(series, -res.params[1:2])
filtered = sm.tsa.filters.convolution_filter(tmp, np.r_[1, -res.params[:1]], nsides=1)
print(pd.DataFrame({
'ARIMA resid': res.resid,
'arma_innovations': arma_innovations.arma_innovations(
series, ar_params=res.params[:1], ma_params=res.params[1:2])[0],
'combined filters': filtered}))
gives:
ARIMA resid arma_innovations combined filters
0 0.496714 0.496714 NaN
1 -0.134253 -0.134253 -0.136915
2 0.668094 0.668094 0.668246
3 1.507288 1.507288 1.507279
4 -0.193560 -0.193560 -0.193559
.. ... ... ...
95 -1.448784 -1.448784 -1.448784
96 0.268421 0.268421 0.268421
97 0.212966 0.212966 0.212966
98 0.046281 0.046281 0.046281
99 -0.244725 -0.244725 -0.244725
SARIMA(1, 0, 1)x(1, 0, 0, 3)
The seasonal model is a little more complicated, because it requires multiplying lag polynomials. For additional details, see this example notebook from the Statsmodels documentation.
np.random.seed(42)
ar_poly = [1, -0.5]
sar_poly = [1, 0, 0, -0.1]
ar = np.polymul(ar_poly, sar_poly)
series = sm.tsa.arma_generate_sample(ar=ar, ma=(1, 0.2), nsample=100)
res = sm.tsa.arima.ARIMA(series, order=(1, 0, 1), seasonal_order=(1, 0, 0, 3), trend='n').fit()
a = res.resid
# Apply the recursive then convolution filter
tmp = sm.tsa.filters.recursive_filter(series, -res.polynomial_reduced_ma[1:])
filtered = sm.tsa.filters.convolution_filter(tmp, res.polynomial_reduced_ar, nsides=1)
print(pd.DataFrame({
'ARIMA resid': res.resid,
'arma_innovations': arma_innovations.arma_innovations(
series, ar_params=-res.polynomial_reduced_ar[1:],
ma_params=res.polynomial_reduced_ma[1:])[0],
'combined filters': filtered}))
gives:
ARIMA resid arma_innovations combined filters
0 0.496714 0.496714 NaN
1 -0.100303 -0.100303 NaN
2 0.625066 0.625066 NaN
3 1.557418 1.557418 NaN
4 -0.209256 -0.209256 -0.205201
.. ... ... ...
95 -1.476702 -1.476702 -1.476702
96 0.269118 0.269118 0.269118
97 0.230697 0.230697 0.230697
98 -0.004561 -0.004561 -0.004561
99 -0.233466 -0.233466 -0.233466