Search code examples
rgraphvizh2ogbm

Classification Tree Diagram from H2O Mojo/Pojo


This question draws heavily from the solution to this question as a jumping off point. Given that I can use R to produce a mojo model object:

library(h2o)
h2o.init()
airlinedf <- h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")
airlinemodel <- h2o.gbm(model_id = "airlinemodel",
                training_frame = airlinedf,
                x = c("Year", "Month", "DayofMonth", "DayOfWeek", "UniqueCarrier"),
                y = "IsDepDelayed",
                max_depth = 3,
                ntrees = 5)
h2o.download_mojo(airlinemodel, getwd(), FALSE)

And bash/graphviz to produce a tree diagram of that model:

java -cp h2o.jar hex.genmodel.tools.PrintMojo --tree 0 -i airlinemodel.zip -o airlinemodel.gv
dot -Tpng airlinemodel.gv -o airlinemodel.png

Example GBM Tree Diagram My question is three fold:

  1. How do I explain the values and decisions in this visualization and the values at the terminal nodes? What are the NAs in the second tier? If the values in the terminal nodes are "class probabilities", how can they be negative?

    1. Is there a way to visualize or conceptualize a "summary tree" of all the trees in the model?

    2. How can I produce a diagram to use color or shape to indicate the binary classification assignments of items in the end node?


Solution

  • There is a better way to build decision trees with H2O - without extracting MOJOs or leaving R/Python - using new Tree API (starting with 3.22.0.1). For comprehensive explanations see:

    1. Inspecting Decision Trees with H2O
    2. Finally, You can Plot H2O Decision Trees in R