My objective is to get a vector of boolean values to indicate whether or not each column in a data.frame is a factor, I used is.factor()
in sapply()
and apply()
functions, and it seems like they return different value, and apply()
returns the wrong value. Can someone tell me what causes the difference?
X <- data.frame(X1=c(1,2,3,4), ## numeric
X2=factor(paste0("f",c(1:4))) ## factor)
sapply(X, is.factor)
## FALSE TRUE
apply(X, 2, is.factor)
## FALSE FALSE // apparently this is wrong, the second value is supposed to be TRUE.
Same thing happened to other functions like class()
, is.numeric()
.
From the reference of apply
:
Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.
Therefore, it converts your input object to a matrix (array) first which must have the same atomic data type. This means that your data get coerced to character
, because factor
is not an atomic vector type.
> as.matrix(X)
X1 X2
[1,] "1" "f1"
[2,] "2" "f2"
[3,] "3" "f3"
[4,] "4" "f4"