what is the difference between the mean_squared_error computed in both of these pieces of code? how can I compare metric='mse' to mean_squared_error?

rf_reg = RandomForestRegressor(
n_estimators=500, 
max_depth=30,                             
criterion="mse",                               
max_features=6,                              
n_jobs=-1,                             
random_state=1)

rf_reg.fit(x_train, y_train)

train_pred_y = rf_reg.predict(x_train)
test_pred_y = rf_reg.predict(x_test)

print(f"train_MSE = {mean_squared_error(y_train, train_pred_y)}")
print(f"test_MSE = {mean_squared_error(y_test, test_pred_y)}")

and

automl.fit(X_train, y_train, task="regression",metric='mse',time_budget=3600)

Solution

metric= 'mse' and 'mean_squared_error' are the same "function" (should be kinda aliases, but that's only a guess).

The mean_squared_error(y_train, train_pred_y) and the metric='mse' in your .fit() function should print you exactly the same results.

What metric='mse' is doing, is just printing you the current model's MSE on the train-dataset your model achieving for each epoch while the training is running. You could change that to an MAE (mean absolute error) function for example, so you're having a second metric in your training output and not twice the same ;)

could look similar to this:

Epoch 1/15
108/108 [==============================] - 4s 19ms/step - loss: 0.0173 - mse: 0.0173
Epoch 2/15
108/108 [==============================] - 1s 14ms/step - loss: 0.0143 - mse: 0.0143
...

print(f"train_MSE = {mean_squared_error(y_train, train_pred_y)}") is printing you the MSE again (should be equal to the already printed MSE of the last training epoch).

In the last line the MSE is calculated again but this time the model makes predictions on the test set. Hence the train_MSE and test_MSE should / will differ.

And last but not least criterion="mse"is the loss-function your model is trying to minimize during training ;)