I am trying to write code that will allow me to find the number of each factor per column in R with the restriction that I want the factor levels to be the same in each column. I thought this should be trivial, but I am running into two places where R does not return quite the value I expect when using apply with factor and using apply with table.
Consider this sample data:
mat <- matrix(sample(1:10,90,replace=TRUE),ncol=10,nrow=9)
mat.levels <- as.character(unique(as.vector(mat)))
mat.factor <- as.data.frame(apply(mat,2,as.character))
My first step was to relevel each column so that the factor levels are the same. At first I tried:
apply(mat.factor,2,factor,levels=mat.levels)
#But the data structure is all wrong, I don't appear to have a factor anymore!
str(apply(mat.factor,2,factor,levels=mat.levels))
So I brute forced it using a loop instead...
for (i in 1:ncol(mat.factor)) {
levels(mat.factor[,i]) <- mat.levels
}
Then I ran into another problem with apply. I thought that now I had the factor levels set, if I was missing a given factor in a column the table function should return a count of 0 for that factor level. However, when I used apply it seemed like the factor levels with a zero count were dropped out!
apply(mat.factor,2,table)$V10
str(apply(mat.factor,2,table)$V10)
#But running table just on that one column yields the expected result!
table(mat.factor[,10])
str(table(mat.factor[,10]))
Would somebody explain what is happening in these two cases? What am I misconceptualizing?
Read the first sentence in the Details section of ?apply
and then run as.matrix(mat.factor)
to see the problem. Use lapply
for data frames, not apply
.
Here's an example:
mat.factor <- as.data.frame(lapply(mat.factor,factor,levels = mat.levels))
lapply(mat.factor,table)