I am trying to apply grid search to H2O unsupervised isolation forest in R. Here is my code:
Accesses.hex <- as.h2o(Accesses)
x <- names(Accesses.hex)
seed <- 12345
# Model hyperparameters
hyper_params <- list(ntrees = c(50, 100, 150, 200),
max_depth = c(8, 15, 20, 30), # default is 8
sample_size = c(128, 256, 512))
# Early stopping criteria
search_criteria <- list(strategy = "RandomDiscrete",
max_models = 100,
max_runtime_secs = 4000,
stopping_rounds = 15,
seed = seed)
model.grid <- h2o.grid(algorithm = "isolationForest",
x = x,
grid_id = "model_grid",
training_frame = Accesses.hex,
hyper_params = hyper_params,
search_criteria = search_criteria,
seed = seed)
However, I got an error saying:
Error in h2o.grid(algorithm = "isolationForest", x = x, grid_id = "model_grid", :
Must specify response, y
I am using isolation forest for unsupervised learning here, so I don’t have the response variable y. Is it possible to do a grid search within H2O in this case?
My computer: OS X 10.14.6, 16 GB memory
H2O cluster version: 3.30.0.1
H2O cluster total nodes: 1
H2O cluster total memory: 15.00 GB
H2O cluster total cores: 16
H2O cluster allowed cores: 16
H2O cluster healthy: TRUE
R Version: R version 3.6.3 (2020-02-29)
Please let me know if there is any other information I can provide. Thanks for your help!
It will not work due to not having a target column with the current design. Isolation forest with grid search support is currently in development and targeted to be released with 3.30.1.1 according to this Jira.