Search code examples
rdataframeapplyfrequency

How do you make a multiple variable frequency table in R when not all values are present in all columns?


I'd like to make a frequency table like this in R:

df = data.frame(aa = c(9,8,7,8), bb = c(9,7,9,8), cc = c(7,9,8,7))
apply(df, 2, table)

# outputs:  
#   aa bb cc
# 7  1  1  2
# 8  2  1  1
# 9  1  2  1

But, if one of the columns of df would have a count of 0 (e.g. if we change the above so that df$cc has no 9) we'll get a list instead of a nice dataframe.

# example that gives a list
df = data.frame(aa = c(9,8,7,8), bb = c(9,7,9,8), cc = c(7,8,8,7))
apply(df, 2, table)

What's a simple way do something similar that will guarantee dataframe output regardless of the counts?

I can imagine a number of solutions that seem messy or hacked, for example, this produces the desired result:

# example of a messy but correct solution
df = data.frame(aa = c(9,8,7,8), bb = c(9,7,9,8), cc = c(7,8,8,7))
apply(df, 2, function(x) summary(factor(x, levels = unique(unlist(df)))))

Is there a cleaner way to do this?


Solution

  • I'll go ahead and answer, though I still object to the lack of criteria. If we think of "tidy" as the opposite of "messy", then we should first tidy the input data into a long format. Then we can do a two-way table:

    library(tidyr)
    df %>% gather %>%
      with(table(value, key))
    #      key
    # value aa bb cc
    #     7  1  1  2
    #     8  2  1  2
    #     9  1  2  0
    

    Thanks to Markus for a base R version:

    table(stack(df))
    #       ind
    # values aa bb cc
    #      7  1  1  2
    #      8  2  1  2
    #      9  1  2  0