I applied a binary classification using H2O. I simply divided my set into 3 which are train, calibrate and test. After training and calibration I checked the results on the test set. Here the corresponding part:
final_grid = H2OGridSearch(model=H2OGradientBoostingEstimator(model_id = 'contract_gbm2',
stopping_rounds = 5, stopping_tolerance = 1e-4, seed = 23,
stopping_metric = "AUC",balance_classes = True,
max_runtime_secs=300, calibrate_model=True, calibration_frame=valid,
nfolds = 5),
hyper_params=hyper_params_gbm,search_criteria=search_criteria)
What I have noticed is that the predicted class and the given probabilities are not always consistent. See below:
As seen the prediction is not decided based on the highest probability? What am I missing?
The threshold is max-F1, not 0.5.
If you dont like that threshold, of course, then you can compare p1 with whatever threshold you like.