Search code examples
rrattle

What does the number on top of a node in a fancyRpartPlot decision tree mean?


What does the number on top of a node in a fancyRpartPlot decision tree mean? I've highlighted them in the picture below.

Example fancyRPartPlot

My guess is that they are the order/rank of the nodes, but I can't explain the jumps (in th example, 9-11 are missing) in the numbers.


Solution

  • The numbers at the top of each node in the tree correspond to the branch numbers in the textual representation of the trees as generated by the default print() method. To confirm:

    > dt <- rpart::rpart(Species ~ ., iris)
    > print(dt)
    n= 150 
    
    node), split, n, loss, yval, (yprob)
          * denotes terminal node
    
    1) root 150 100 setosa (0.33 0.33 0.33)  
      2) Petal.Length< 2.45 50   0 setosa (1.00 0.00 0.00) *
      3) Petal.Length>=2.45 100  50 versicolor (0.00 0.50 0.50)  
        6) Petal.Width< 1.75 54   5 versicolor (0.00 0.91 0.093) *
        7) Petal.Width>=1.75 46   1 virginica (0.00 0.022 0.98) *
    > rattle::fancyRpartPlot(dt)
    

    enter image description here

    The "jumps" result from rpart() tuning the tree to remove some of the branches and those pruned branches do not appear in the final tree.