Search code examples
rdataframecoercion

Is this coercion? Why is R telling me these are the same data types?


In a dataframe, I want to be able to separate columns with numeric types from columns with strings/characters.

Here is my data:

test=data.frame(col1=sample(1:20,10),col2=sample(31:50,10),
col3=sample(101:150,10),col4=sample(c('a','b','c'),10,replace=T))

Which looks like

   col1 col2 col3 col4
1     2   41  132    c
2    11   47  141    b
3    13   39  135    a
4    12   31  117    b
5    19   42  106    a
6     8   50  118    a
7    14   33  149    a
8     6   48  148    b
9    16   37  150    b
10    9   34  140    a

Now here is the strange thing if I do typeof a row/col containing a character, R says it is an integer

> typeof(test[1,4])
[1] "integer"

If I do something like this

> apply(test,2,typeof)
       col1        col2        col3        col4 
"character" "character" "character" "character" 

R says they are all characters. Also,

> lapply(test,typeof)
[1] "integer" "integer" "integer" "integer"

Again, what is going on and is there a good way to distinguish between columns with characters and columns with integers?


Solution

  • apply works on arrays and matrices, not data frames.

    To work on a data frame, it first converts it to a matrix.

    Your data frame has a factor column, so array converts everything to characters. Without bothering to tell you.

    As you have seen, sapply is the way to go, and class is probably the thing you want to find out. Although there's also mode, typoeof, and storage.mode depending on what you want to know:

    > test$col5=letters[1:10]  # really character, not a factor
    > test$col3=test$col3*pi # lets get some decimals in there
    
    
    > sapply(test, mode)
           col1        col2        col3        col4        col5 
      "numeric"   "numeric"   "numeric"   "numeric" "character" 
    > sapply(test, class)
           col1        col2        col3        col4        col5 
      "integer"   "integer"   "numeric"    "factor" "character" 
    > sapply(test, typeof)
           col1        col2        col3        col4        col5 
      "integer"   "integer"    "double"   "integer" "character" 
    > sapply(test, storage.mode)
           col1        col2        col3        col4        col5 
      "integer"   "integer"    "double"   "integer" "character"