Search code examples
pythonstatsmodelsforecasting

Forecasting out-of-sample with exogenous variables using Time-varying regression example code in Statsmodels -python


My question is: How to forecast out of sample values with exogenous predictors using the Statsmodels state-space class TVRegression and the custom data provided in the example (see link below). I have spent several hours searching for examples of how to forecasts out-of-sample values when the regression model contains exogenous variables. I want to build a simple Dynamic Linear Model class. I found a class in Statsmodels, TVRegression (see here), [https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_custom_models.html][1] that should solve my problem. The TVRegression class takes two exogenous predictors and a response variable as arguments. I copy and pasted the code and ran the example in the link above without a problem. However, I am unable to produce a simple out-of-sample forecast, even using the example data given. The TVRegression class is a child class of sm.tsa.statespace.MLEModel and thus should inherit all of the associated methods. One of the methods of sm.tsa.statespace.MLEModel is forecast() and according to the UserGuide I should be able to provide a simple step argument and get out of sample forecast: MLEResults.forecast(steps=1, **kwargs). The code used to generate the dependent :y and the independents(exogs) x_t;w_t

def gen_data_for_model1():
    nobs = 1000

    rs = np.random.RandomState(seed=93572)

    d = 5
    var_y = 5
    var_coeff_x = 0.01
    var_coeff_w = 0.5

    x_t = rs.uniform(size=nobs)
    w_t = rs.uniform(size=nobs)
    eps = rs.normal(scale=var_y ** 0.5, size=nobs)


    beta_x = np.cumsum(rs.normal(size=nobs, scale=var_coeff_x ** 0.5))
    beta_w = np.cumsum(rs.normal(size=nobs, scale=var_coeff_w ** 0.5))

    y_t = d + beta_x * x_t + beta_w * w_t + eps
    return y_t, x_t, w_t, beta_x, beta_w

y_t, x_t, w_t, beta_x, beta_w = gen_data_for_model1()

The TVRegression class from the link provided above:

class TVRegression(sm.tsa.statespace.MLEModel):
    def __init__(self, y_t, x_t, w_t):
        exog = np.c_[x_t, w_t]  # shaped nobs x 2

        super(TVRegression, self).__init__(
            endog=y_t, exog=exog, k_states=2, initialization="diffuse"
        )

        # Since the design matrix is time-varying, it must be
        # shaped k_endog x k_states x nobs
        # Notice that exog.T is shaped k_states x nobs, so we
        # just need to add a new first axis with shape 1
        self.ssm["design"] = exog.T[np.newaxis, :, :]  # shaped 1 x 2 x nobs
        self.ssm["selection"] = np.eye(self.k_states)
        self.ssm["transition"] = np.eye(self.k_states)

        # Which parameters need to be positive?
        self.positive_parameters = slice(1, 4)

    @property
    def param_names(self):
        return ["intercept", "var.e", "var.x.coeff", "var.w.coeff"]

    @property
    def start_params(self):
        """
        Defines the starting values for the parameters
        The linear regression gives us reasonable starting values for the constant
        d and the variance of the epsilon error
        """
        exog = sm.add_constant(self.exog)
        res = sm.OLS(self.endog, exog).fit()
        params = np.r_[res.params[0], res.scale, 0.001, 0.001]
        return params

    def transform_params(self, unconstrained):
        """
        We constraint the last three parameters
        ('var.e', 'var.x.coeff', 'var.w.coeff') to be positive,
        because they are variances
        """
        constrained = unconstrained.copy()
        constrained[self.positive_parameters] = (
            constrained[self.positive_parameters] ** 2
        )
        return constrained

    def untransform_params(self, constrained):
        """
        Need to unstransform all the parameters you transformed
        in the `transform_params` function
        """
        unconstrained = constrained.copy()
        unconstrained[self.positive_parameters] = (
            unconstrained[self.positive_parameters] ** 0.5
        )
        return unconstrained

    def update(self, params, **kwargs):
        params = super(TVRegression, self).update(params, **kwargs)

        self["obs_intercept", 0, 0] = params[0]
        self["obs_cov", 0, 0] = params[1]
        self["state_cov"] = np.diag(params[2:4])

The simple results from fit using the fake generated data:

mod = TVRegression(y_t, x_t, w_t)
res = mod.fit()

print(res.summary())

What I want is to at least accomplish the following without error:

res.forecast(steps = 5)

Ideally I could get help on how to construct the argument exog to accept new values of x_t and w_t as exog predictors for this class.

What I have tried so far:

  1. I added self.k_exog in the init section of the class code in response to the first error.

  2. In my second attempt I received the following value error:

    ValueError: Out-of-sample operations in a model with a regression component require additional exogenous values via the exog argument.

  3. I have attempt to add the exogenous variables by concatenating new values, so that the steps are equal to the slice of data.

  • e.g. res.forecast(steps = 5, np.c_(w_t[:5],x_t[:5])

Solution

  • Solution: The easiest way to do this is:

    First redefine your constructer as:

    def __init__(self, y_t, exog):
        super(TVRegression, self).__init__(
            endog=y_t, exog=exog, k_states=2, initialization="diffuse"
        )
        self.k_exog = self.exog.shape[1]
    
        # Since the design matrix is time-varying, it must be
        # shaped k_endog x k_states x nobs
        # Notice that exog.T is shaped k_states x nobs, so we
        # just need to add a new first axis with shape 1
        self.ssm["design"] = exog.T[np.newaxis, :, :]  # shaped 1 x 2 x nobs
        self.ssm["selection"] = np.eye(self.k_states)
        self.ssm["transition"] = np.eye(self.k_states)
    
        # Which parameters need to be positive?
        self.positive_parameters = slice(1, 4)
    

    Then and add a new method

    def clone(self, endog, exog=None, **kwargs):
        return self._clone_from_init_kwds(endog, exog=exog, **kwargs)
    

    Now you can do e.g.:

    exog_t = np.c_[x_t, w_t]
    mod = TVRegression(y_t[:-5], exog=exog_t[:-5])
    res = mod.fit()
    print(res.summary())
    
    res.forecast(steps=5, exog=exog_t[-5:])
    

    Discussion:

    The issue here is that when your model depends on these exog variables, then you need updated values for exog for the out-of-sample part of the forecast (since the exog you provided when constructing the model only contains in-sample values).

    To perform the forecasting, Statsmodels has to be able to essentially create a new model representation using the new out-of-sample exog. This is what the clone method is for - it describes how to create a new copy of the model, but with a different dataset. Since this model is simple, there is an existing generic cloning method (_clone_from_init_kwds) that we can hook into.

    Finally, we need to pass the out-of-sample exog values to the forecast method, using the exog argument.