Search code examples
rdplyrcategorical-databaseline

Is there a function in dplyr/forcats to display count and percentages from a dataframe of dichotomous variables?


I frequently get stuck when I want to summarise categorial variables in my dataset. My dataset contains a dichotomous variables (yes/no) per patient. In the below example set , "A-C" are risk factors that the person does or does not have.

A <- c("yes", "no", "yes", "no", "yes")
B <- c("no", "no", "yes", "yes", "no")
C <- c("yes", "no", "yes", "no", "yes")

df <- data.frame(A, B, C)

what I am trying to do is to summarise all variables to factor level counts and percentages - with one line of code. I tried using apply, forcats, dplyr but can't get it right. Can anyone help me :)

I am hoping to get:

A : Yes 3 | %

No 2 | %

B: ..

C..

The ultimate goal is make a big summary table of baseline characteristics of a study population with both continous and categorical variables. Probably will try to use CBCgrps or tableone.

Thank you!


Solution

  • You can use forcats::fct_count():

    library(purrr)
    library(forcats)
    
    map_df(df, fct_count, prop = TRUE, .id = "var")
    
    # A tibble: 6 x 4
      var   f         n     p
      <chr> <fct> <int> <dbl>
    1 A     no        2   0.4
    2 A     yes       3   0.6
    3 B     no        3   0.6
    4 B     yes       2   0.4
    5 C     no        2   0.4
    6 C     yes       3   0.6