Search code examples
rj48

Properties and their values out of J48 tree (RWeka)


If you run the following:

library(RWeka) 
data(iris) 
res = J48(Species ~., data = iris)

res will be a list of class J48 inheriting from Weka_tree. If you print it

R> res
J48 pruned tree
------------------

Petal.Width <= 0.6: setosa (50.0)
Petal.Width > 0.6
|   Petal.Width <= 1.7
|   |   Petal.Length <= 4.9: versicolor (48.0/1.0)
|   |   Petal.Length > 4.9
|   |   |   Petal.Width <= 1.5: virginica (3.0)
|   |   |   Petal.Width > 1.5: versicolor (3.0/1.0)
|   Petal.Width > 1.7: virginica (46.0/1.0)

Number of Leaves  :     5

Size of the tree :  9

I would like to get the properties and their values by their order from right to left. So for this case:

Petal.Width, Petal.Width, Petal.Length, Petal.Length.

I tried to enter res to a factor and to run the command:

str_extract(paste0(x, collapse=""), perl("(?<=\\|)[A-Za-z]+(?=\\|)"))

with no success. Just to remember that we should ignore the left around characters.


Solution

  • One way to do this is to convert the J48 object from RWeka to a party object from partykit. You just need to as as.party(res) and this does all the parsing for you and returns a structure that is easier to work with with standardized extractor functions etc.

    In particular you can then use all advice given in other discussions about ctree objects etc. See

    And I think the following should do at least part of what you want:

    library("partykit")
    pres <- as.party(res)
    partykit:::.list.rules.party(pres)
    ##                                                                                  2 
    ##                                                               "Petal.Width <= 0.6" 
    ##                                                                                  5 
    ##                     "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length <= 4.9" 
    ##                                                                                  7 
    ## "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length > 4.9 & Petal.Width <= 1.5" 
    ##                                                                                  8 
    ##  "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length > 4.9 & Petal.Width > 1.5" 
    ##                                                                                  9 
    ##                                            "Petal.Width > 0.6 & Petal.Width > 1.7" 
    

    Update: The OP contacted me off-list for a related question, asking for a specific printed representation of the tree. I'm including my solution here in case it is useful for someone else.

    He wanted to have ( ) symbols signalling the hierarchy levels plus the names of the splitting variables. One way to do so would be to (1) extract variable names of the underlying data:

    nam <- names(pres$data)
    

    (2) Turn the recursive node structure of the tree into a flat list (which is somewhat more convenient for constructing the desired string):

    tr <- as.list(pres$node)
    

    (3a) Initialize the string:

    str <- "("
    

    (3b) Recursively add brackets and/or variable names to the string:

    update_str <- function(x) {
       if(is.null(x$kids)) {
         str <<- paste(str, ")")
       } else {
         str <<- paste(str, nam[x$split$varid], "(")
         for(i in x$kids) update_str(tr[[i]])
       }
    }
    

    (3c) Call the recursion, starting from the root node:

    update_str(tr[[1]])
    str
    ## [1] "( Petal.Width ( ) Petal.Width ( Petal.Length ( ) Petal.Width ( ) ) )"