Search code examples
rdecision-treerpart

with quotation mark and without quotation mark, what is the difference


For the following code, I wish to find the minimun cp item which has the lowest xerror

data(iris)
install.packages("rpart")
library(rpart)
set.seed(161)
tree.model1<-rpart(Sepal.Length~., data = iris)
install.packages("rpart.plot")
library(rpart.plot)
rpart.plot(tree.model1)
tree.model2<-rpart(Sepal.Length~., data = iris, cp=0.005)
tree.model2$cptable
par(mfrow=c(1,2))
rpart.plot(tree.model1)
rpart.plot(tree.model2)
which.min(tree.model2$cptable[,"xerror"])

my question is focused on the last line, what if I put which.min(tree.model2$cptable[, xerror] it doesn't work

what is the function of put the quotation mark here?


Solution

  • R syntax dictates the use of quotation marks when indexing with strings. I assume your confusion is that since xerror is a variable name and you normally use it without quoting in other lines, you expect it to be the same. However, you must see the difference between the index of a variable and the variable itself.

    Therefore the use of [] (indexing) does not allow for you to use xerror without quotation but it will work when you use which.min(tree.model2$cptable[,4]) for instance, since xerror is the 4th column (another index for "xerror") in the cptable.

    You'll start to pick these up as you progress further using R. Another tip would be to neatly write and comment your code so both you and others can understand easily.