Search code examples
rmatrixapplyr-factor

Finding the number of each factor per column in R


I am trying to write code that will allow me to find the number of each factor per column in R with the restriction that I want the factor levels to be the same in each column. I thought this should be trivial, but I am running into two places where R does not return quite the value I expect when using apply with factor and using apply with table.

Consider this sample data:

mat <- matrix(sample(1:10,90,replace=TRUE),ncol=10,nrow=9)
mat.levels <- as.character(unique(as.vector(mat)))
mat.factor <- as.data.frame(apply(mat,2,as.character))

My first step was to relevel each column so that the factor levels are the same. At first I tried:

apply(mat.factor,2,factor,levels=mat.levels)
#But the data structure is all wrong, I don't appear to have a factor anymore!
str(apply(mat.factor,2,factor,levels=mat.levels))

So I brute forced it using a loop instead...

for (i in 1:ncol(mat.factor)) {
      levels(mat.factor[,i]) <- mat.levels
    }

Then I ran into another problem with apply. I thought that now I had the factor levels set, if I was missing a given factor in a column the table function should return a count of 0 for that factor level. However, when I used apply it seemed like the factor levels with a zero count were dropped out!

apply(mat.factor,2,table)$V10
str(apply(mat.factor,2,table)$V10)
#But running table just on that one column yields the expected result!
table(mat.factor[,10])
str(table(mat.factor[,10]))

Would somebody explain what is happening in these two cases? What am I misconceptualizing?


Solution

  • Read the first sentence in the Details section of ?apply and then run as.matrix(mat.factor) to see the problem. Use lapply for data frames, not apply.

    Here's an example:

    mat.factor <- as.data.frame(lapply(mat.factor,factor,levels = mat.levels))
    lapply(mat.factor,table)