Apologies if this has been answered elsewhere but i couldn't find anything.
I'm using h2o (latest release) in R. I've created a random forest model using h2o.grid (for parameter tuning) and called this 'my_rf'
My steps are as follows:
The exact line I've used for h2o.performance is:
h2o.performance(my_rf, newdata = as.h2o(test))
.... which gives me a confusion matrix, from which I can calculate accuracy (as well as giving me AUC, max F1 score etc)
I would have thought that using
h2o.predict(my_rf, newdata = as.h2o(test))
I would be able to replicate the confusion matrix from h2o.performance. But the accuracy is different - 3% worse in fact.
Is anyone able to explain why this is so?
Also, is there any way to return the predictions that make up the confusion matrix in h2o.performance?
Edit: here is the relevant code:
library(mlbench)
data(Sonar)
head(Sonar)
mainset <- Sonar
mainset$Class <- ifelse(mainset$Class == "M", 0,1) #binarize
mainset$Class <- as.factor(mainset$Class)
response <- "Class"
predictors <- setdiff(names(mainset), c(response, "name"))
# split into training and test set
library(caTools)
set.seed(123)
split = sample.split(mainset[,61], SplitRatio = 0.75)
train = subset(mainset, split == TRUE)
test = subset(mainset, split == FALSE)
# connect to h2o
Sys.unsetenv("http_proxy")
Sys.setenv(JAVA_HOME='C:\\Program Files (x86)\\Java\\jre7') #set JAVA home for 32 bit
library(h2o)
h2o.init(nthread = -1)
# stacked ensembles
nfolds <- 5
ntrees_opts <- c(20:500)
max_depth_opts <- c(4,8,12,16,20)
sample_rate_opts <- seq(0.3,1,0.05)
col_sample_rate_opts <- seq(0.3,1,0.05)
rf_hypers <- list(ntrees = ntrees_opts, max_depth = max_depth_opts,
sample_rate = sample_rate_opts,
col_sample_rate_per_tree = col_sample_rate_opts)
search_criteria <- list(strategy = 'RandomDiscrete', max_runtime_secs = 240, max_models = 15,
stopping_metric = "AUTO", stopping_tolerance = 0.00001, stopping_rounds = 5,seed = 1)
my_rf <- h2o.grid("randomForest", grid_id = "rf_grid", x = predictors, y = response,
training_frame = as.h2o(train),
nfolds = 5,
fold_assignment = "Modulo",
keep_cross_validation_predictions = TRUE,
hyper_params = rf_hypers,
search_criteria = search_criteria)
get_grid_rf <- h2o.getGrid(grid_id = "rf_grid", sort_by = "auc", decreasing = TRUE) # get grid of models built
my_rf <- h2o.getModel(get_grid_rf@model_ids[[1]])
perf_rf <- h2o.performance(my_rf, newdata = as.h2o(test))
pred <- h2o.predict(my_rf, newdata = as.h2o(test))
pred <- as.vectpr(pred$predict)
cm <- table(test[,61], pred)
print(cm)
Mostly likely, function h2o.performance is using F1 threshold to set yes and no. If you take the predict results and instrument the table to separate yes/no based based on models "F1 threshold" value you will see the number is almost match. I believe this is the main reason you see discrepancy in the results between h2o.performance and h2o.predict.