python keras time-series lstm forecasting

Which optimization metric for differenced data when doing a multi-step forecast?

I've written an LSTM in Keras for univariate time series forecasting. I'm using an input window of size 48 and an output window of size 12, i.e. I'm predicting 12 steps at once. This is working generally well with an optimization metric such as RMSE.

For non-stationary time series I'm differencing the data before feeding the data to the LSTM. Then after predicting, I take the inverse difference of the predictions.

When differencing, RMSE is not suitable as an optimization metric as the earlier prediction steps are a lot more important than later steps. When we do the inverse difference after creating a 12-step forecast, then the earlier (differenced) prediction steps are going to affect the inverse difference of later steps.

So what I think I need is an optimization metric that gives the early prediction steps more weight, preferably exponentially.

Does such a metric exist already or should I write my own? Am I overlooking something?

Solution

Just wrote my own optimization metric, it seems to work well, certainly better than RMSE.

Still curious what's the best practice here. I'm relatively new to forecasting.

from tensorflow.keras import backend as K

def weighted_rmse(y_true, y_pred):
    weights = K.arange(start=y_pred.get_shape()[1], stop=0, step=-1, dtype='float32')
    y_true_w = y_true * weights
    y_pred_w = y_pred * weights
    return K.sqrt(K.mean(K.square(y_true_w - y_pred_w), axis=-1))