Im doing a typical Machine Learning regressor problem. There are 800 data points and 6 features. The best model Extra Trees Regressor returns 30 for Root Mean Square Error. I take a log transformation to make the extreme data less influential. The log transforms the data from skewed right to be normally distributed as well. The error is only 0.54 so why such a drastic change
log(30) = 1.4 I do understand my statistics knowledge is not the best but this seems quite strange for me. I haven't done any tuning to the parameters.
With that being said, what error should I believe? What is the interpretation of each?
Take a log of predicted values
pricing['runtime.min'] = np.log(pricing['runtime.min'])
Function to evaluate a model
def evaluate(model, test_features, test_labels):
predictions = model.predict(test_features)
#Absolute Error
errors = metrics.mean_absolute_error(test_labels, predictions)
#Mean Square Error
MSerrors = metrics.mean_squared_error(test_labels, predictions)
#Root Mean Squared Error
RMSE = np.sqrt(metrics.mean_squared_error(test_labels, predictions))
print('Model Perfomance')
print('Average MAE Error: {:0.4f} degrees. '.format(errors))
print('Average MSE Error: {:0.4f} degrees. '.format(MSerrors))
print('Average RMS Error: {:0.4f} degrees. '.format(RMSE))
return 'end of test'
Extra trees regressor
et_params = {'n_estimators': 1000, 'max_features':2}
et = SklearnExtra(clf = ExtraTreesRegressor(), seed = Seed, params = et_params)
et.fit(x_train, y_train)
base_models = [rf, et, gb, ada, xg]
for i in base_models:
print('Model ' + i.name())
print('Training: '+str(evaluate(i, x_train, y_train)))
print('')
print('Model ' + i.name())
print('Test: '+ str(evaluate(i, x_test, y_test)))
print('Test MAPE '+ str(mean_absolute_percentage_error(i, y_test, x_test)))
Model ExtraTreesRegressor(bootstrap=False, criterion='mse', max_depth=None,
max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
Model Perfomance
Average MAE Error: 0.0165 degrees.
Average MSE Error: 0.0079 degrees.
Average RMS Error: 0.0887 degrees.
Training: end of test
Model Perfomance
Average MAE Error: 0.3572 degrees.
Average MSE Error: 0.2957 degrees.
Average RMS Error: 0.5438 degrees.
Test: end of test
We can't tell you whether you should do something, you have to decide if it makes sense for the data.
But that's why I asked in the comments about the transform and the statistics, if you changed the scale of your targets, you can't use the absolute difference between the two models for analysis. If your initial model had a RMSE of 30, but the range of predicted values is 0 to 100 and the standard deviation is 20 (for example) then that's not great. But in your new model if the data is 0 to 10 and the st.dev. is 3.5, then a rmse of .5 might be better.
The correct answer is a little bit subjective, but boils down to this: If you make predictions on real world data using the model in question, is the error metric within acceptable tolerances for the task at hand? For your initial model, is your business case such that 30 (seconds? Minutes?) or difference in run time between predicted or actual will be "close enough"? Is .54 log-milliseconds enough to render your predictions useless for the second model?
See the "Useful" part of "All models are wrong, some are useful"