I've always been confused by the variable types in R. Now I encountered a problem after transposing a data frame.
For example, I'm using table()
to get a count of each factor in a certain vector:
data(iris)
count <- as.data.frame(table(iris$Species))
typeof(count$Var1)
# [1] "integer"
typeof(count$Freq)
# [1] "integer"
My 1st question would be, why is count$Var1
"integer"? Can strings be "integer" too? But this does not matter because I can change the type by count$Var1 <- as.character(count$Var1)
, then typeof(count$Var1)
becomes "character".
Now I transpose this data frame by transposed_count <- as.data.frame(t(count))
. But I get confused because:
typeof(transposed_count[1,])
[1] "list"
typeof(transposed_count[2,])
[1] "list"
transposed_count[2,]
V1 V2 V3
Freq 50 50 50
For consequent use, I need transposed_count[2,]
to be a numeric vector like:
transposed_count[2,]
[1] 50 50 50
How can I do that? And why did them become "list" after t()
? Sorry if it's a stupid question. Thanks!
My 1st question would be, why is
count$Var1
"integer"?
Because factors are have integer storage type
> is.factor(count$Var1)
[1] TRUE
and the "strings" in the iris data.frame
, as is typical in R, are stored as factors.
And why did them become "list" after
t()
?
When you transpose you get a matrix, and matrices must have the same storage class for each entry. What you'll actually get first is a matrix of characters, as the integer values will be coerced. Then, when you subsequently change to a data.frame
, those characters will by default be coerced to (new) factors.
> t(count)
[,1] [,2] [,3]
Var1 "setosa" "versicolor" "virginica"
Freq "50" "50" "50"
> transposed_count <- as.data.frame(t(count))
> transposed_count[2,1]
Freq
50
Levels: 50 setosa
> as.numeric(transposed_count[2,1])
[1] 1
So what was a count of 50 now is a factor with a numeric value of 1! Not what you want.
As to why typeof(transposed_count[1,])
is a list? As a horizontal slice of a data.frame it is actually a data.frame.
> is.data.frame(transposed_count[2,])
[1] TRUE
And data.frames are just lists with class information.
But how can I get a "transposed" data frame then?
It sounds like you may want
> library(reshape2)
> dcast(melt(count), variable~Var1)
Using Var1 as id variables
variable setosa versicolor virginica
1 Freq 50 50 50
after I read all samples in, I'm gonna rbind all data frame
You'll have to ensure the columns line up appropriately. Depending on the analysis to come it may be more natural to rbind
as is with another column indicating the source.
> count2 <- count
> count$source = "file1"
> count2$source = "file2"
> (mcount <- rbind(count,count2))
Var1 Freq source
1 setosa 50 file1
2 versicolor 50 file1
3 virginica 50 file1
4 setosa 50 file2
5 versicolor 50 file2
6 virginica 50 file2
Now you don't have to worry about alignment if you do want to reshape later
> dcast(melt(mcount), ...~Var1)
Using Var1, source as id variables
source variable setosa versicolor virginica
1 file1 Freq 50 50 50
2 file2 Freq 50 50 50