Search code examples
rweka

How to get J48 size and number of leaves


If I build a J48 tree by:

library(RWeka)

fit <- J48(Species~., data=iris)

I get the following result:

> fit
J48 pruned tree
------------------

Petal.Width <= 0.6: setosa (50.0)
Petal.Width > 0.6
|   Petal.Width <= 1.7
|   |   Petal.Length <= 4.9: versicolor (48.0/1.0)
|   |   Petal.Length > 4.9
|   |   |   Petal.Width <= 1.5: virginica (3.0)
|   |   |   Petal.Width > 1.5: versicolor (3.0/1.0)
|   Petal.Width > 1.7: virginica (46.0/1.0)

Number of Leaves  :     5

Size of the tree :  9

I would like to get the Number of Leaves into a variable N (so N will get 5) and the Size of the tree to S (so S will get 9).

Is there a way to get this information directly from J48 tree?


Solution

  • As previously pointed out by @LyzandeR it is not easy to do this on the J48 object directly. Generally, the objects returned by the fitting functions in RWeka usually contain relatively few informations on the R side (e.g., only the call and the fitted predictions). The main ingredient is typically a reference to Java object built by Weka to which Weka's own methods can be applied on the Java side via .jcall and then returned in R.

    However, for the J48 trees it is easy to transform the information from the Java side into an R object for which standard functions and methods are available. The partykit package provides a coercion function that transforms J48 trees into constparty objects (recursive partitions with constant fits in the leaves). Then methods like length(), width(), or depth() can be used to query the number of nodes, leaves, and the depth of the tree, respectively.

    library("RWeka")
    fit <- J48(Species ~ ., data = iris)
    library("partykit")
    p <- as.party(fit)
    length(p)
    ## [1] 9
    width(p)
    ## [1] 5
    depth(p)
    ## [1] 4
    

    Furthermore, predict(), plot(), print() and many other tools are available for the party object.

    I would recommend using this approach over the text parsing suggested by @LyzandeR because the as.party conversion does not rely on potentially error-prone text computations. Instead, it internally calls Weka's own graph generator (via .jcall) and then parses this into the constparty structure.