Search code examples
rtreepartyj48

Understanding partykit graph out of j48 in R


I've made an analysis of a dataset i have consisting on 266 istances and about 100 indicators on that using j48 tree in R. I'm not the most skilled in machine learning, anyway i managed to get the j48 tree in both Weka and R. In the latter i found that the tree could be visualized trough partykit package. However, i find difficult to interpret the results i have, that are these (X, Y and Z are 3 of 100+ indicators i use to describe each of the 266 istances, of which 190 are normal or 0 and 76 are abnormal or 1). J48 pruned tree

The code i used is very easy:

m1 <- J48(Case~., data = mydata, control = Weka_control(R = TRUE))
if(require("partykit", quietly = TRUE)) plot(m1)

thus i've pruned the tree. One more question: i've understood i may obtain the fitted values from the tree, but i dont know how. Any help on both or just one question will be appreciated.


Solution

  • The variables X, Y, Z have been selected to split (or partition) your data while the remaining variables have not been selected. The resulting terminal nodes thus lead to different probabilities for the response. The response probabilities are also displayed by the stacked bar plots in the terminal nodes of the visualization.

    For example, if X <= 34, then the response probability is rather low (around 17%). This is the largest subset with 193 of the 266 observations. The only subset for which the reponse probability is very high (around 96%) are the 35 observations with X > 34 & Y <= 482 & Z > 451.

    As already pointed out by @Roman Luštrik: The fitted values for each observation can be obtained by predict(m1, type = "prob").