Search code examples
pythonscikit-learnlstmrecurrent-neural-networkmetrics

Cross Validation using metrics.mean_squared_error, "found array with dim 3. Estimator expected <= 2." Error


I am training a model (many to many) using LSTM RNN. I will apply cross validation to improve the result quality, but I cannot use the 'metrics.mean_squared_error' function because it is a multivariate system. Should I create a cross validation function manually or can I work with this function using 3D arrays?

Here is the shapes of my train and test data;

X_train1.shape, y_train1.shape, X_test1.shape, y_test1.shape

((118000, 50, 9), (118000, 1, 9), (51950, 50, 9), (51950, 1, 9))

And here is the code I have tried:

from sklearn.model_selection import KFold
from tensorflow.keras.layers import Dense, Activation
from sklearn import metrics

# Cross-Validate
kf = KFold(5, shuffle=True, random_state=42) # Use for KFold classification
oos_y = []
oos_pred = []

fold = 0
for train, test in kf.split(trainX):
    fold+=1
    print(f"Fold #{fold}")
        
    x_train = X_train1
    y_train = y_train1
    x_test = X_test1
    y_test = y_test1
    
    model = Sequential()
    model.add(LSTM(128, activation='relu', input_shape=(X_train1.shape[1], X_train1.shape[2]), return_sequences=True))
    model.add(LSTM(64, activation='relu', return_sequences=False))
    model.add(Dropout(0.2))
    model.add(Dense(y_train1.shape[2]))
    model.compile(optimizer='adam', loss='mse', metrics='mae')
    model.summary()
    
    history = model.fit(X_train1, y_train1, epochs=1, batch_size=16, validation_split=0.1, verbose=1)
    
    pred = model.predict(x_test)

    oos_y.append(y_test)
    oos_pred.append(pred)    

    # Measure this fold's RMSE
    score = np.sqrt(metrics.mean_squared_error(pred,y_test))
    print(f"Fold score (RMSE): {score}")

# Build the oos prediction list and calculate the error.
oos_y = np.concatenate(oos_y)
oos_pred = np.concatenate(oos_pred)
score = np.sqrt(metrics.mean_squared_error(oos_pred,oos_y))
print(f"Final, out of sample score (RMSE): {score}")    
    
# Write the cross-validated prediction
oos_y = pd.DataFrame(oos_y)
oos_pred = pd.DataFrame(oos_pred)
oosDF = pd.concat( [df, oos_y, oos_pred],axis=1 )
#oosDF.to_csv(filename_write,index=Fal

se)


Solution

  • If the y_true and y_pred shapes are (51950, 1, 9), reshape into (51950, 9) and compute RMSE with:

    rmse = mean_squared_error(
        y_true.reshape(-1, 1*9),
        y_pred.reshape(-1, 1*9),
        squared=False,               # Set to False for Root Mean Square Error
    )
    

    Or:

    ex, d1, d2 = y_true.shape
    y_true = y_true.reshape(ex, d1*d2)