Search code examples
rdplyrsummary

R: Quickest way to summarize number of observations for multiple variables


I am sure this is a super simple thing, but I cannot find a really quick and easy solution.

I have patient data with a lot of columns in a format like this:

patID   disease   category ...
1       1          A
2       0          B
3       1          C
4       1          B

How can I quickly produce a summary table, which includes the number of observations for each column/variable in the dataframe? The result should be something like this:

VARIABLE     Number of rows
disease:1    3
disease:0    1
category:A   1
category:B   2
category:C   1
...

I know I can do this for a single variable by just using table(data$column). But how can I produce something similar for all columns in a dataframe?


Solution

  • Using tidyr and dplyr:

    gather(data, variable, value, -patID) %>%
      count(variable, value)
    

    (Thanks @Frank for reminding me about tally and count.)