I frequently get stuck when I want to summarise categorial variables in my dataset. My dataset contains a dichotomous variables (yes/no) per patient. In the below example set , "A-C" are risk factors that the person does or does not have.
A <- c("yes", "no", "yes", "no", "yes")
B <- c("no", "no", "yes", "yes", "no")
C <- c("yes", "no", "yes", "no", "yes")
df <- data.frame(A, B, C)
what I am trying to do is to summarise all variables to factor level counts and percentages - with one line of code. I tried using apply, forcats, dplyr but can't get it right. Can anyone help me :)
I am hoping to get:
A : Yes 3 | %
No 2 | %
B: ..
C..
The ultimate goal is make a big summary table of baseline characteristics of a study population with both continous and categorical variables. Probably will try to use CBCgrps or tableone.
Thank you!
You can use forcats::fct_count()
:
library(purrr)
library(forcats)
map_df(df, fct_count, prop = TRUE, .id = "var")
# A tibble: 6 x 4
var f n p
<chr> <fct> <int> <dbl>
1 A no 2 0.4
2 A yes 3 0.6
3 B no 3 0.6
4 B yes 2 0.4
5 C no 2 0.4
6 C yes 3 0.6