I am learning how to code in R for machine learning. I am using rpart to do the heavy lifting. However, when I go to plot my decision tree, only a leaf node 'yes' is plotted. I've created the decision tree myself by hand using information gain. The tree should have three levels of nodes.
Here is what R gives me.
Here is my R code.
library(FSelector)
library(rpart)
library(rpart.plot)
library(caret)
library(dplyr)
library(data.tree)
library(caTools)
table <- read.csv("play-data.csv")
table <- select(table, Outlook, Temperature, Humidity, Windy, Play)
table <- mutate(table, Outlook = factor(Outlook), Temperature = factor(Temperature), Humidity = factor(Humidity), Play = factor(Play))
tree <- rpart(Play ~ Outlook + Temperature + Humidity + Windy, data = table)
prp(tree)
Here is the data from 'play-data.csv'.
The data is being read in correctly, and the selection and mutation functions seem to be fine as well. So I don't know what gives. I tried Googling the problem but only found one other thread about it with no concise answer that I can understand.
You are getting a tree with a single node because you are using the default settings for rpart
. The documentation is a little indirect. The documentation tells you that there is a parameter called control
and says "See rpart.control." If you click through to the documentation for rpart.control, you will see that there is a parameter called minsplit
which is described as "the minimum number of observations that must exist in a node in order for a split to be attempted." The default value is 20 and you only have 14 data points altogether. It will not split the root node. Instead, use rpart.control
to set minsplit
to a lower value (try 2).