In this tutorial, I tried to use another method for converting categorical variables to factor.
In the article, the following method is used.
library(MASS)
library(rpart)
cols <- c('low', 'race', 'smoke', 'ht', 'ui')
birthwt[cols] <- lapply(birthwt[cols], as.factor)
and I replaced the last line by
birthwt[cols] <- as.factor((birthwt[cols]))
but the result is NA all
What is wrong with that?
as.factor((birthwt[cols]))
is calling as.factor
on a list of 5 vectors. If you do that R will interpret each of those 5 vectors as the levels, and the column headers as the labels, of a factor variable, which is clearly not what you want:
> as.factor(birthwt[cols])
low race smoke ht ui
<NA> <NA> <NA> <NA> <NA>
5 Levels: c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) ...
> labels(as.factor(birthwt[cols]))
[1] "low" "race" "smoke" "ht" "ui"
lapply
iterates over a list, calling the function as.factor
on each of the vectors separately in that list. You need to do this to convert each variable separately into a factor, rather than attempting to convert the entire list into a single factor, which is what as.factor(birthwt[cols])
does.