Search code examples
rdplyrdecision-tree

Preparing data with dplyr gives: NAs introduced by coercion


I am following along with a book on building decision trees and thought I could make a piece of code a bit prettier. This is the code in question:

library(tree)
library(ISLR)
library(dplyr)

attach(Carseats)

High=ifelse(Sales <=8,"No","Yes ")
Carseats =data.frame(Carseats ,High)
tree.carseats <- tree(High~ . -Sales, Carseats)

What the code does is that it adds a column to the Carseats data frame before making a tree structure.

The code I thought would be prettier to read is:

library(tree)
library(ISLR)
library(dplyr)

Carseats <- Carseats %>% mutate(High = ifelse(Sales <= 8, "No", "Yes"))
tree.carseats <- tree(High~ . -Sales, Carseats)

However trying to run the last line with the altered code gives the warning:

Warning message:
In tree(High ~ . - Sales, Carseats) : NAs introduced by coercion

When I try to do a summary of the tree.carseats it throws an error with the modified code:

Error in y - frame$yval[object$where] : 
  non-numeric argument to binary operator

What is wrong with my thinking process here?


Solution

  • Not sure where the problem originated, but it is solved if you call factor on the result of if_else...

    In general, it is not recommended to attach the data directly, this might lead to unpredictable behavior.

    library(tree)
    library(ISLR)
    library(dplyr)
    
    data("Carseats")
    
    Carseats <- Carseats %>% mutate(High = factor(if_else(Sales <= 8, "No", "Yes")))
    
    tree.carseats <- tree(High~ . -Sales, data = Carseats)