Search code examples
decision-treeexplainrpart

How to interpret an unusual decision tree output (multi-classes) using rpart


I am trying to plot a decision tree using rpart package and really confused with its ouput. It is noted that at 3rd node, how can agriculture and mining classes be produced from urban?

I think it should be agriculture and urban instead of agriculture and mining. Here is my code

df<-read.csv("https://raw.githubusercontent.com/tuyenhavan/Statistics/Dataset/Landsat_Data.csv")

library(rpart)

library(rpart.plot)
set.seed(123)

dt<-rpart(Land_cover~., data=df)

rpart.plot(dt,cex=0.35)

Please help me to explain it. Thank you


Solution

  • The nodes display the relative frequencies of all response categories along with the majority vote, i.e., the most frequent category. In case there are ties, the first of those most frequent categories is displayed as the majority vote (which is a somewhat arbitrary selection, of course).

    Therefore, in the root node all categories occur with the equal frequency of 20% and "Agriculture" is displayed as the majority vote because it is lexicographically the first category.

    Similarly, in node 3 (for Band1 >= 0.03599656) "Urban" and "Water" are still tied for the most frequent category (200 observations = 24.969%). And thus "Urban" is listed as the majority vote.