Error in converting categorical variables to factor in R

In this tutorial, I tried to use another method for converting categorical variables to factor.

In the article, the following method is used.

library(MASS)
library(rpart)
cols <- c('low', 'race', 'smoke', 'ht', 'ui')
birthwt[cols] <- lapply(birthwt[cols], as.factor)

and I replaced the last line by

birthwt[cols] <- as.factor((birthwt[cols]))

but the result is NA all

What is wrong with that?

Solution

as.factor((birthwt[cols])) is calling as.factor on a list of 5 vectors. If you do that R will interpret each of those 5 vectors as the levels, and the column headers as the labels, of a factor variable, which is clearly not what you want:

> as.factor(birthwt[cols])
  low  race smoke    ht    ui 
 <NA>  <NA>  <NA>  <NA>  <NA> 
5 Levels: c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) ...
> labels(as.factor(birthwt[cols]))
[1] "low"   "race"  "smoke" "ht"    "ui"

lapply iterates over a list, calling the function as.factor on each of the vectors separately in that list. You need to do this to convert each variable separately into a factor, rather than attempting to convert the entire list into a single factor, which is what as.factor(birthwt[cols]) does.