Search code examples
rdataframeapplytabular

Table of column counts for a data frame


I have a data frame with columns of categorical data supplied as strings. The categories for each column are the same, e.g.:

myDF=data.frame(col1=sample(c("a","b","c"),10,replace=T),
                col2=sample(c("a","b","c"),10,replace=T),
                col3=sample(c("a","b","c"),10,replace=T))

I would like to generate a table of counts in each category by column.

When all columns have all categories included, this can be done with apply using the function table, for instance:

> myDF
   col1 col2 col3
1     a    c    a
2     b    b    b
3     a    a    b
4     b    b    a
5     c    c    a
6     a    a    a
7     a    c    c
8     a    a    c
9     c    a    a
10    a    a    b
> apply(myDF,2,table)
  col1 col2 col3
a    6    5    5
b    2    2    3
c    2    3    2

However, if one column is missing some categories, this doesn't work because table doesn't know what categories to expect:

myDF=data.frame(col1=sample(c("a","b","c"),10,replace=T),
                col2=sample(c("a","b","c"),10,replace=T),
                col3=sample(c("a","b"),10,replace=T))

Gives:

> myDF
   col1 col2 col3
1     c    a    a
2     a    a    b
3     b    a    a
4     c    c    a
5     c    a    a
6     c    c    a
7     c    b    a
8     c    b    b
9     a    a    a
10    b    b    a
> apply(myDF,2,table)    
$col1

a b c 
2 2 6 

$col2

a b c 
5 3 2 

$col3

a b 
8 2 

How can I produce a table that looks like the first one, with 0 for any missing categories?


Solution

  • You could collect all the factor levels and use those in apply:

    #get the levels from the whole data.frame
    all_levels <- levels(unlist(myDF))
    
    #convert each column to factor using the levels from above
    #and then use table (which will return a zero for any missing levels)
    apply(myDF, 2, function(x) table(factor(x, levels = all_levels)))
    

    Output:

      col1 col2 col3
    a    1    4    7
    b    5    2    3
    c    4    4    0
    
    > myDF
       col1 col2 col3
    1     b    a    a
    2     c    b    a
    3     c    c    b
    4     b    a    b
    5     b    c    a
    6     c    c    a
    7     c    b    a
    8     b    a    b
    9     a    c    a
    10    b    a    a