Search code examples
rcategorical-data

How to make a list with counts and percentages of different categorical variables in R


I am using the following code to get counts and percentages of the proportions of the different levels of a categorical variable.

table(df$var1)

table(df$var1)%>%
prop.table()

The issue is that I have more than 50 categorical variables in my dataset and manually doing this for each one is too time-consuming. Also, I have to note it down in my report on a table. I was wondering if it was possible to include different variables in a code and then the output is a table with multiple categorical variables with counts and percentages. Something like the attached picture in this question. Output table


Solution

  • Here's an example using dplyr. You use select() to choose the categorical variables, which can be done by name, column number or one of the tidy select functions (e.g., matches(), starts_with(), ends_with(), contains()). The rest just makes the counts and turns them into proportions. In the code, all you would have to change is the input data frame and the variables in the call to select().

    library(dplyr)
    library(tidyr)
    data(diamonds, package="ggplot2")
    diamonds %>% 
      select(cut:clarity) %>%
      mutate(across(everything(), as.character)) %>% 
      pivot_longer(everything(), values_to="category", names_to="variable") %>% 
      group_by(variable, category) %>% 
      tally() %>%
      group_by(variable) %>% 
      mutate(prop = n/sum(n))
    #> # A tibble: 20 × 4
    #> # Groups:   variable [3]
    #>    variable category      n   prop
    #>    <chr>    <chr>     <int>  <dbl>
    #>  1 clarity  I1          741 0.0137
    #>  2 clarity  IF         1790 0.0332
    #>  3 clarity  SI1       13065 0.242 
    #>  4 clarity  SI2        9194 0.170 
    #>  5 clarity  VS1        8171 0.151 
    #>  6 clarity  VS2       12258 0.227 
    #>  7 clarity  VVS1       3655 0.0678
    #>  8 clarity  VVS2       5066 0.0939
    #>  9 color    D          6775 0.126 
    #> 10 color    E          9797 0.182 
    #> 11 color    F          9542 0.177 
    #> 12 color    G         11292 0.209 
    #> 13 color    H          8304 0.154 
    #> 14 color    I          5422 0.101 
    #> 15 color    J          2808 0.0521
    #> 16 cut      Fair       1610 0.0298
    #> 17 cut      Good       4906 0.0910
    #> 18 cut      Ideal     21551 0.400 
    #> 19 cut      Premium   13791 0.256 
    #> 20 cut      Very Good 12082 0.224
    

    Created on 2023-05-16 with reprex v2.0.2