My question is: How to forecast out of sample values with exogenous predictors using the Statsmodels state-space class TVRegression and the custom data provided in the example (see link below). I have spent several hours searching for examples of how to forecasts out-of-sample values when the regression model contains exogenous variables. I want to build a simple Dynamic Linear Model class. I found a class in Statsmodels, TVRegression (see here), [https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_custom_models.html][1] that should solve my problem. The TVRegression class takes two exogenous predictors and a response variable as arguments. I copy and pasted the code and ran the example in the link above without a problem. However, I am unable to produce a simple out-of-sample forecast, even using the example data given. The TVRegression class is a child class of sm.tsa.statespace.MLEModel and thus should inherit all of the associated methods. One of the methods of sm.tsa.statespace.MLEModel is forecast() and according to the UserGuide I should be able to provide a simple step argument and get out of sample forecast: MLEResults.forecast(steps=1, **kwargs). The code used to generate the dependent :y and the independents(exogs) x_t;w_t
def gen_data_for_model1():
nobs = 1000
rs = np.random.RandomState(seed=93572)
d = 5
var_y = 5
var_coeff_x = 0.01
var_coeff_w = 0.5
x_t = rs.uniform(size=nobs)
w_t = rs.uniform(size=nobs)
eps = rs.normal(scale=var_y ** 0.5, size=nobs)
beta_x = np.cumsum(rs.normal(size=nobs, scale=var_coeff_x ** 0.5))
beta_w = np.cumsum(rs.normal(size=nobs, scale=var_coeff_w ** 0.5))
y_t = d + beta_x * x_t + beta_w * w_t + eps
return y_t, x_t, w_t, beta_x, beta_w
y_t, x_t, w_t, beta_x, beta_w = gen_data_for_model1()
The TVRegression class from the link provided above:
class TVRegression(sm.tsa.statespace.MLEModel):
def __init__(self, y_t, x_t, w_t):
exog = np.c_[x_t, w_t] # shaped nobs x 2
super(TVRegression, self).__init__(
endog=y_t, exog=exog, k_states=2, initialization="diffuse"
)
# Since the design matrix is time-varying, it must be
# shaped k_endog x k_states x nobs
# Notice that exog.T is shaped k_states x nobs, so we
# just need to add a new first axis with shape 1
self.ssm["design"] = exog.T[np.newaxis, :, :] # shaped 1 x 2 x nobs
self.ssm["selection"] = np.eye(self.k_states)
self.ssm["transition"] = np.eye(self.k_states)
# Which parameters need to be positive?
self.positive_parameters = slice(1, 4)
@property
def param_names(self):
return ["intercept", "var.e", "var.x.coeff", "var.w.coeff"]
@property
def start_params(self):
"""
Defines the starting values for the parameters
The linear regression gives us reasonable starting values for the constant
d and the variance of the epsilon error
"""
exog = sm.add_constant(self.exog)
res = sm.OLS(self.endog, exog).fit()
params = np.r_[res.params[0], res.scale, 0.001, 0.001]
return params
def transform_params(self, unconstrained):
"""
We constraint the last three parameters
('var.e', 'var.x.coeff', 'var.w.coeff') to be positive,
because they are variances
"""
constrained = unconstrained.copy()
constrained[self.positive_parameters] = (
constrained[self.positive_parameters] ** 2
)
return constrained
def untransform_params(self, constrained):
"""
Need to unstransform all the parameters you transformed
in the `transform_params` function
"""
unconstrained = constrained.copy()
unconstrained[self.positive_parameters] = (
unconstrained[self.positive_parameters] ** 0.5
)
return unconstrained
def update(self, params, **kwargs):
params = super(TVRegression, self).update(params, **kwargs)
self["obs_intercept", 0, 0] = params[0]
self["obs_cov", 0, 0] = params[1]
self["state_cov"] = np.diag(params[2:4])
The simple results from fit using the fake generated data:
mod = TVRegression(y_t, x_t, w_t)
res = mod.fit()
print(res.summary())
What I want is to at least accomplish the following without error:
res.forecast(steps = 5)
Ideally I could get help on how to construct the argument exog to accept new values of x_t and w_t as exog predictors for this class.
What I have tried so far:
I added self.k_exog in the init section of the class code in response to the first error.
In my second attempt I received the following value error:
ValueError: Out-of-sample operations in a model with a regression component require additional exogenous values via the exog
argument.
I have attempt to add the exogenous variables by concatenating new values, so that the steps are equal to the slice of data.
Solution: The easiest way to do this is:
First redefine your constructer as:
def __init__(self, y_t, exog):
super(TVRegression, self).__init__(
endog=y_t, exog=exog, k_states=2, initialization="diffuse"
)
self.k_exog = self.exog.shape[1]
# Since the design matrix is time-varying, it must be
# shaped k_endog x k_states x nobs
# Notice that exog.T is shaped k_states x nobs, so we
# just need to add a new first axis with shape 1
self.ssm["design"] = exog.T[np.newaxis, :, :] # shaped 1 x 2 x nobs
self.ssm["selection"] = np.eye(self.k_states)
self.ssm["transition"] = np.eye(self.k_states)
# Which parameters need to be positive?
self.positive_parameters = slice(1, 4)
Then and add a new method
def clone(self, endog, exog=None, **kwargs):
return self._clone_from_init_kwds(endog, exog=exog, **kwargs)
Now you can do e.g.:
exog_t = np.c_[x_t, w_t]
mod = TVRegression(y_t[:-5], exog=exog_t[:-5])
res = mod.fit()
print(res.summary())
res.forecast(steps=5, exog=exog_t[-5:])
Discussion:
The issue here is that when your model depends on these exog
variables, then you need updated values for exog
for the out-of-sample part of the forecast (since the exog
you provided when constructing the model only contains in-sample values).
To perform the forecasting, Statsmodels has to be able to essentially create a new model representation using the new out-of-sample exog
. This is what the clone
method is for - it describes how to create a new copy of the model, but with a different dataset. Since this model is simple, there is an existing generic cloning method (_clone_from_init_kwds
) that we can hook into.
Finally, we need to pass the out-of-sample exog
values to the forecast
method, using the exog
argument.