In a dataframe, I want to be able to separate columns with numeric types from columns with strings/characters.
Here is my data:
test=data.frame(col1=sample(1:20,10),col2=sample(31:50,10),
col3=sample(101:150,10),col4=sample(c('a','b','c'),10,replace=T))
Which looks like
col1 col2 col3 col4
1 2 41 132 c
2 11 47 141 b
3 13 39 135 a
4 12 31 117 b
5 19 42 106 a
6 8 50 118 a
7 14 33 149 a
8 6 48 148 b
9 16 37 150 b
10 9 34 140 a
Now here is the strange thing if I do typeof a row/col containing a character, R says it is an integer
> typeof(test[1,4])
[1] "integer"
If I do something like this
> apply(test,2,typeof)
col1 col2 col3 col4
"character" "character" "character" "character"
R says they are all characters. Also,
> lapply(test,typeof)
[1] "integer" "integer" "integer" "integer"
Again, what is going on and is there a good way to distinguish between columns with characters and columns with integers?
apply
works on arrays and matrices, not data frames.
To work on a data frame, it first converts it to a matrix.
Your data frame has a factor column, so array converts everything to characters. Without bothering to tell you.
As you have seen, sapply
is the way to go, and class
is probably the thing you want to find out. Although there's also mode
, typoeof
, and storage.mode
depending on what you want to know:
> test$col5=letters[1:10] # really character, not a factor
> test$col3=test$col3*pi # lets get some decimals in there
> sapply(test, mode)
col1 col2 col3 col4 col5
"numeric" "numeric" "numeric" "numeric" "character"
> sapply(test, class)
col1 col2 col3 col4 col5
"integer" "integer" "numeric" "factor" "character"
> sapply(test, typeof)
col1 col2 col3 col4 col5
"integer" "integer" "double" "integer" "character"
> sapply(test, storage.mode)
col1 col2 col3 col4 col5
"integer" "integer" "double" "integer" "character"