python statsmodels kalman-filter state-space

smoothed state disturbances in statsmodels' state space model

I am building a custom state-space model with statsmodels as follows

from statsmodels.tsa.statespace import mlemodel

mod = mlemodel.MLEModel(YY, k_states=n_st, k_posdef=n_sh)
mod['design'] = Z
mod['transition'] = T
mod['selection'] = R
mod['state_cov'] = np.eye(n_sh)
mod['obs_intercept'] = d

mod.initialize_stationary()

I am interested in the smoothed states and smoothed state disturbances, which I get with

results = mod.smooth([])

The smoothed states in results.smoothed_state are correct (I have the true values against which I am comparing), but the smoothed state disturbances in results.smoothed_state_disturbance are shifted forward one period - the first column contains the (correct) smoothed disturbances for the second period, etc, while the last column contains zeros, which are the correct smoothed disturbances for one period after the end of the sample.

My understanding is that this has to do with the timing of the state equation, which according to statsmodels docs here is

alpha(t+1) = T alpha(t) + R eta(t) (1)

and therefore implies that the first observation y_{1} is related to the state alpha_{1} which in turn depends on the disturbance eta_{0}, and that the smoothed value of that disturbance is not returned by the smoother. On the other hand, in this statmodels docs, the timing of the state equation is

alpha(t) = T alpha(t-1) + R eta(t) (2)

and implies that the state alpha_{1} depends on eta_{1}, not on eta_{0}. Since both (future (1) and contemporaneous (2)) timing conventions appear in the statsmodels docs, I thought that it would be possible to choose which one to use. Unfortunately, haven't been able to find out how. I tried changing the smoother timing with results = mod.smooth([], filter_timing=1) which according to the docs uses Kim and Nelson (1999) (contemporaneous) timing, rather than the default Durbin and Koopman (2012) (future) timing. But then I get totally different (and wrong, because I know what the true values are) results not only from the smoother but also for the value of the loglikelihood. I also looked for examples in the unit tests for smoothing but there are only tests against MATLAB and R libraries that also use the future timing, and there are no tests (for disturbance smoothing) against STATA, which uses the alternative contemporaneous timing.

My question is, is there a way to either write the state equation with the contemporaneous timing ((2) above) or to recover the smoothed state disturbances associated with the observed data in the first period.

Here is some code for the following AR(1) model with a measurement error, using contemporaneous timing for the state equation, initialized with the stationary distribution.

alpha(0) ~ N(0, 1/(1-.5**2))

alpha(t) = .5 alpha(t-1) + eta(t), eta(t) ~ N(0, 1)

y(t) = alpha(t) + e(t), e(t) ~ N(0, 1)

from statsmodels.tsa.statespace import mlemodel
import numpy as np
import sys
from scipy.stats import multivariate_normal

from numpy.random import default_rng
gen = default_rng(42)

T = np.array([.5])
Z = np.array([1.])
Q = np.array([1.])
H = np.array([1.])
R = np.array([1.])

P0 = 1/(1-T**2)

# Simulate data for 2 periods

alpha0 = gen.normal(0, np.sqrt(P0))
eta1 = gen.normal(0, 1)
e1 = gen.normal(0, 1)
eta2 = gen.normal(0, 1)
e2 = gen.normal(0, 1)

alpha1 = .5*alpha0 + eta1
y1 = alpha1 + e1

alpha2 = .5*alpha1 + eta2
y2 = alpha2 + e2

First, use statsmodels.statespace for to compute smoothed state, smoothed state disturbance and log-likelihood given just the first data point

mod1 = mlemodel.MLEModel(y1, k_states=1, k_posdef=1)

mod1['design'] = Z
mod1['transition'] = T
mod1['selection'] =  R
mod1['state_cov'] =  Q
mod1['obs_cov'] =  H

mod1.initialize_stationary()
results1 = mod1.smooth([])

results1.smoothed_state, results1.smoothed_state_disturbance, results1.llf

gives

(array([[-0.06491681]]), array([[0.]]), -1.3453530272821392)

Note that observing y(1) we can compute the conditional expectations of eta(1), however, what is returned here is only the conditional expectations of eta(2). Since the model is stationary and Gaussian, the conditional expectations of alpha(1) and eta(1) given y(1) can be computed from their joint distribution (see here for the relevant formulae), as shown in the following code

# Define a matrix L1 which maps [alpha(0), eta(1), e(1)] into [alpha0, eta1, e1, alpha1, y1]

L1 = np.vstack((np.eye(3),          # alpha(0), eta(1), e(1)
                np.r_[T, 1, 0],     # alpha(1)
                np.r_[T, 1, 1],     # y(1)           
              ))
# check
np.testing.assert_array_equal(np.r_[alpha0, eta1, e1, alpha1, y1],
                              L1 @ np.r_[alpha0, eta1, e1])

# Compute Sigma1 as the covariance matrix of [alpha0, eta1, e1, alpha1, y1]

D1 = np.eye(3)
D1[0, 0] = P0
Sigma1 = L1 @ D1 @ L1.T

# [alpha0, eta1, e1, alpha1, y1] has a multivariate Normal distribution, and we can apply well-known formulae to  compute conditional expectations and the log-likelihood

ind_e1 = 1
ind_eta1 = 2
ind_alpha1 = 3
ind_y1 = 4

smooth_eta1 =  (Sigma1[ind_eta1, ind_y1]/Sigma1[ind_y1, ind_y1])*y1
smooth_alpha1 = (Sigma1[ind_alpha1, ind_y1]/Sigma1[ind_y1, ind_y1])*y1
loglik1 = multivariate_normal.logpdf(y1, cov=Sigma1[ind_y1, ind_y1])

smooth_alpha1, smooth_eta1, loglik1

which gives

(array([-0.06491681]), array([-0.04868761]), -1.3453530272821392)

Extending to the first 2 periods, with statsmodels

y = np.array([y1, y2])
mod2 = mlemodel.MLEModel(y, k_states=1, k_posdef=1)

mod2.ssm.timing_init_filtered = True

mod2['design'] = Z
mod2['transition'] = T
mod2['selection'] =  R
mod2['state_cov'] =  Q
mod2['obs_cov'] =  H

mod2.initialize_stationary()
results2 = mod2.smooth([])

results2.smoothed_state, results2.smoothed_state_disturbance, results2.llf

gives

(array([[-0.25292213, -0.78447967]]),
 array([[-0.65801861,  0.        ]]),
 -3.1092778246103645)

And computing the conditional expectations from the joint distribution

# L2 maps [alpha(0), eta(1), e(1), eta(2), e(2)] into [alpha0, eta1, e1, eta2, e2, alpha1, alpha2, y1, y2]

L2 = np.vstack((np.eye(5), # alpha(0), eta(1), e(1), eta(2), e(2)
                np.r_[T,    1, 0, 0, 0],     # alpha(1)
                np.r_[T**2, T, 0, 1, 0],     # alpha(2)
                np.r_[T,    1, 1, 0, 0],     # y(1)           
                np.r_[T**2, T, 0, 1, 1],     # y(2)
              ))

np.testing.assert_array_equal(np.r_[alpha0, eta1, e1, eta2, e2, alpha1, alpha2, y1, y2],
                              L2 @ np.r_[alpha0, eta1, e1, eta2, e2,]) 

# Sigma2 is the covariance of [alpha0, eta1, e1, eta2, e2, alpha1, alpha2, y1, y2]

D2 = np.eye(5)
D2[0, 0] = P0
Sigma2 = L2 @ D2 @ L2.T

ind_e = [2, 4]
ind_eta = [1, 3]
ind_alpha = [5, 6]
ind_y = [7, 8]

# compute smoothed disturbances and states, and loglikelihood

smooth_eta = Sigma2[ind_eta, :][:, ind_y] @ np.linalg.solve(Sigma2[ind_y, :][:, ind_y], y)
smooth_alpha = Sigma2[ind_alpha, :][:, ind_y] @ np.linalg.solve(Sigma2[ind_y, :][:, ind_y], y)
loglik2 = multivariate_normal.logpdf(y.flatten(), cov=Sigma2[ind_y, :][:, ind_y])

smooth_alpha.flatten(), smooth_eta.flatten(), loglik2

gives

(array([-0.25292213, -0.78447967]),
 array([-0.1896916 , -0.65801861]),
 -3.1092778246103636)

The smoothed states alpha(t), and the loglikelihood values are the same. The smoothed disturbances returned by statsmodels.statespace.mlemodel are for eta(2) and eta(3).

Solution

Shortest answer

The short answer to your question is that you can do the following to recover the smoothed estimate that you're looking for:

r_0 = results1.smoother_results.scaled_smoothed_estimator_presample
R = mod1['selection']
Q = mod1['state_cov']
eta_hat_0 = Q @ R.T @ r_0
print(eta_hat_0)

This gives [-0.04868761] as you wanted. This comes from the usual disturbance smoother equation (e.g. Durbin and Koopman's 2012 book, equation 4.69) for the period 0. In Statsmodels, we store the smoothed disturbance estimates for the sample period, 1 - nobs, but we do store the r_0, as given above, and so you can compute the value you want directly.

Alternative, maybe easier, method

This is an alternative method to get this output, which we can derive by writing your problem in a different way. Under the timing assumption of Statsmodels, as you point out, the distribution of the first state alpha(1) ~ N(a1, P1) is specified as a prior. We can view your desired model as specifying the 0th state, alpha(0) ~ N(a0, P0), as a prior and then consider the first state as being produced by the transition equation: alpha(1) = T alpha(0) + eta(0).

This is now written using the error convention of Statsmodels, and we can use Statsmodels to compute the smoothed results. The only trick is that there is no observation y(0) associated with the first state alpha(0), because we are only including the alpha(0) -> alpha(1) transition step so that we can specify the prior as you wanted. But that is no problem, we can simply include a missing value.

So if we just modify your original model by putting an nan presample value into the input data, we get the result we want:

mod1 = mlemodel.MLEModel(np.r_[np.nan, y1], k_states=1, k_posdef=1)

mod1['design'] = Z
mod1['transition'] = T
mod1['selection'] =  R
mod1['state_cov'] =  Q
mod1['obs_cov'] =  H

mod1.initialize_stationary()
results1 = mod1.smooth([])

print(results1.smoothed_state, results1.smoothed_state_disturbance, results1.llf)

yields:

[[-0.03245841 -0.06491681]] [[-0.04868761  0.        ]] -1.3453530272821392

Note on the filter_timing flag:

The filter_timing flag also allows you to change the timing convention for the prior, but not for the states. If you set the prior alpha(0|0) ~ N(a_00, P_00) in the alternative timing (filter_timing=1), then setting alpha(1) ~ N(a_1, P_1) with a_1 = T a_00 and P_1 = T P_00 T' + R Q R' in the default timing (filter_timing=0) will give you exactly the same results.