Search code examples
rrpart

How to get percentages from decision tree for each node


How could I create a table that includes the percentages for each node in the plot below?

library(rpart)
library(rattle)
library(rpart.plot)
library(RColorBrewer)

fit <- rpart(Species ~ ., data=iris, method="class")
fancyRpartPlot(fit)

It results in this plot:

image

I would like to output a table with species as the first column and the associated percent at each node in a second column. A second iteration of the table would exclude the first node (100%) and also remove duplicates by retaining the row that contains a higher percentage.

After picking through the "rpart" documentation I'm still unable to figure out how to create this table. Please let me know what you think.

Thank you for your time.


Solution

  • The where element of the rpart-object is the predicted class for the terminal nodes. You can get this in a table with:

    > iris$where <- fit$where
    > with(iris, table(Species, where))
                where
    Species       2  4  5
      setosa     50  0  0
      versicolor  0 49  1
      virginica   0  5 45
    

    I'm guessing you want the column sums divided by the total counts?

    > 100*colSums(with(iris, table(Species, where)) )/150
           2        4        5 
    33.33333 36.00000 30.66667