python time-series simulation statsmodels arima

How to do SARIMAX simulation with given errors [Statsmodels]?

I'm doing a time series project. I fitted a SARIMAX model using the statsmodels statespace SARIMAX function (code below), then I want to simulate some future paths with errors created by me.

The base code for generating a SARIMAX simulation is the following:

from statsmodels.tsa.statespace.sarimax import SARIMAX

model = SARIMAX(time_series, order=order, seasonal_order=seasonal_order,
                simple_differencing=False, trend=trend).fit(disp=False, maxiter=3000)

initial = model.predicted_state[:,-1]

simulation = model.simulate(nsimulations=20, anchor='end', repetitions=1000, initial_state=initial)

I now want to insert my errors, which I generated from a normal distribution. I see that the function simulate takes as input the parameters called measurement_shocks and state_shocks. However it's not clear to me how to use them.

My errors are an array with shape (20, 1000), because I want to pass 20 errors (the number of steps) for all the 1000 repetitions, i.e. all the 1000 paths. I pass them as measurement_shocks. However, this gives me an error telling me I have to pass 20 errors, not 20000 (20*1000). This is strange, since other models (ETSModel) didn't give me that error and everything went fine with them.

Now I try giving the function 20 errors, thinking I can only create one path at a time:

simulation = model.simulate(nsimulations=20, anchor='end', repetitions=1,
                            initial_state=initial, measurement_shocks=errors[:,0])

I get some results. However when I rerun the cell (working on Jupyter), the output changes. Didn't expect that given that I give the function the errors. I realized that that's because the state_shocks are drawn randomly. So I tried fixing the state_shocks to zero (np.zeros((20,1))), but that gave me an output that didn't take into account my errors.

I've played for almost two weeks with initial_state, state_shocks, measurement_shocks, I've checked every output I obtained from these experiments with the sum of the predicted_mean and my errors, I've tried implementing several types of simulation in the attempt to recreate the correct output, but I found no info nor solution.

I'm not familiar with the state space formulation, and I don't think I'm ready to delve into it at the moment.

What should I do?

Solution

Found the answer, I needed to do things a little differently.

simulation = model.simulate(nsimulations=21, anchor=n_data-1, repetitions=1, 
                            initial_state=model.smoothed_state[:, -1], 
                            measurement_shocks=np.zeros((21, 1)), 
                            state_shocks=np.append(errors[:, 0], 0)
                           )

This is because I started the simulation not from the last value of my data, but from the previous one, so as to have the last smoothed state as the first simulated value. Then I passed my errors as state shocks (padded with one zero), while I set the measurement shock as zero. The padding is used to maintain coherence with the number of simulated steps.