Search code examples
c++predictionforecastingnormal-distributionstandard-deviation

Function for testing system stability, which receives predicted time series as input


I want to write a function that gets a time series and a standard deviation as parameters and returns an adjusted time series which looks like a forecast.

With this function I want to test a system for stability, which gets a forecasted time series list for weather as input parameter.

My approach for such a function, which is described below:

vector<tuple<datetime, double>> get_adjusted_timeseries(vector<tuple<datetime, double>>& timeseries_original, const double stddev, const double dist_mid)
{

    auto timeseries_copy(timeseries_original);

    int sign = randInRange(0, 1) == 0 ? 1 : -1;


    auto left_limit = normal_cdf_inverse(0.5 - dist_mid, 0, stddev);
    auto right_limit = normal_cdf_inverse(0.5 + dist_mid, 0, stddev);

    for (auto& pair : timeseries_copy)
    {
        double number;
        do
        {
            nd_value = normal_distribution_r(0, stddev);
        }
        while (sign == -1 && nd_value > 0.0 || sign == 1 && nd_value < 0.0);


        pair = make_tuple(get<0>(pair), get<1>(pair) + (nd_value / 100) * get<1>(pair));


        if (nd_value > 0.0 && nd_value < right_limit || nd_value < 0.0 && nd_value > left_limit)
        {
            sign = sign == -1 ? 1 : -1;
        }
    }

    return timeseries_copy;
}
  • Make a copy from the original time series, which is also from type vector<tuple<datetime, double>>
  • Get a random number that is either 0 or 1 and use the number to set the sign.
  • Use the Inverse Cumulative distribution function to get the limits, which indicate when the sign is changed. The sign is changed when the value of the copied time series is close to the original value. The implementation of the inverse CDF is shown here enter image description here
  • For-loop for each item in the time series:
    • get a normal distributed value, which should be lower zero when sign == -1 and greater zero when sign == 1
    • adjust old value of time series according to the normal distributed value
    • change sign if the normal distributed value is close to the original value.

The result for a low standard deviation, for example, can be seen here in yellow: enter image description here If the mean absolute percentage error (MAPE) of the two time series is calculated, the following relationship results:

  • stddev: 5 -> MAPE: ~0.04
  • stddev: 10 -> MAPE: ~0.08
  • stddev: 15 -> MAPE: ~0.12
  • stddev: 20 -> MAPE: ~0.16

What do you think of this approach?

Can this function be used to test a system that has to deal with predicted time series?


Solution

  • You want to generate time series data that behave like some existing time series data that you have from real phenomena (weather and stock exchange). That generated time series data will be fed into some system to test its stability.

    What you could do is: fit some model to your exiting data, and then use that model to generate data that follow the model, and hence your existing data. Fitting data to a model yields a set of model parameters and a set of deviations (differences not explained by the model). The deviations may follow some known density function but not necessarily. Given the model parameters and deviations, you can generate data that look like the original data. Note that if the model does not explain the data well, deviations will be large, and the data generated with the model will not look like the original data.

    For example, if you know your data is linear, you fit a line through them, and your model would be:

    y = M x + B + E
    

    where E is a random variable that follows the distribution of the error around the line that fits your data, and where M and B are the model parameters. You can now use that model to generate (x, y) coordinates that are rougly linear. When sampling the random variable E, you can assume that it follows some known distribution like a normal distribution, or use an histogram, to generate deviations that follow arbitrary density functions.

    There are several time series models that you could use to fit your weather and stock exchange data. You could look at exponential smoothing. It has several different models. I am sure you can find many other models on Wikipedia.

    If a model does not fit well your data, you can also see its parameters as random variables. In our example above, suppose that we have observed data where it seems that the slope is changing. We would fit several lines and obtain a distribution for M. We would then sample that variable along with E when generating data.