I'm working on making forecasts using a model where variables were scaled by $ x_i = \frac{{x_i - \text{mean}(x_i)}}{{\text{sd}(x_i)}} $, and I've saved the mean and standard deviation. Now, for out-of-sample forecasts, let's say for the target variable $ ( x_i )$, based on the scaled model, how do I scale the forecasts back?
Should I use the in-sample $ \text{Mean}(x_i) $ and $ \text{sd}(x_i) $ to scale the out-of-sample forecasts back, so that:
$ \text{Re-scaled out-of-sample forecast} = \text{Scaled forecast} \times \text{sd}(x_i) + \text{mean}(x_i) $
What's the appropriate procedure here?
Python example:
X = np.random.randn(100, 1) * 10 + 50 # Feature
y = 2 * X + 1 + np.random.randn(100, 1) * 5 # Target variable
X_train, X_test = X[:80], X[80:]
y_train, y_test = y[:80], y[80:]
scaler_X = StandardScaler()
scaler_y = StandardScaler()
X_train_scaled = scaler_X.fit_transform(X_train)
y_train_scaled = scaler_y.fit_transform(y_train)
model = LinearRegression()
model.fit(X_train_scaled, y_train_scaled)
You should indeed use the in-sample mean and standard deviation to rescale the forecasts back to the original scale because of the following reasons:
Rescale the predictions:
X_test_scaled = scaler_X.transform(X_test)
y_pred_scaled = model.predict(X_test_scaled)
y_pred = scaler_y.inverse_transform(y_pred_scaled)
Manual rescaling:
y_pred_manual = y_pred_scaled * scaler_y.scale_ + scaler_y.mean_