This question draws heavily from the solution to this question as a jumping off point. Given that I can use R to produce a mojo model object:
library(h2o)
h2o.init()
airlinedf <- h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")
airlinemodel <- h2o.gbm(model_id = "airlinemodel",
training_frame = airlinedf,
x = c("Year", "Month", "DayofMonth", "DayOfWeek", "UniqueCarrier"),
y = "IsDepDelayed",
max_depth = 3,
ntrees = 5)
h2o.download_mojo(airlinemodel, getwd(), FALSE)
And bash/graphviz to produce a tree diagram of that model:
java -cp h2o.jar hex.genmodel.tools.PrintMojo --tree 0 -i airlinemodel.zip -o airlinemodel.gv
dot -Tpng airlinemodel.gv -o airlinemodel.png
How do I explain the values and decisions in this visualization and the values at the terminal nodes? What are the NAs in the second tier? If the values in the terminal nodes are "class probabilities", how can they be negative?
Is there a way to visualize or conceptualize a "summary tree" of all the trees in the model?
How can I produce a diagram to use color or shape to indicate the binary classification assignments of items in the end node?
There is a better way to build decision trees with H2O - without extracting MOJOs or leaving R/Python - using new Tree API (starting with 3.22.0.1). For comprehensive explanations see: