Search code examples
rdplyrqwraps2

Combining count and percentage in a Table (Categorical variable)


I am using R markdown and qwraps2 and dplyr packages

library(dplyr)
library(qrwaps2)

I have a dataset with one categorical variable and 2 numeric variables that I want to get the count and percentage of the categorical variable in the following format: count (percentage %)

The dataset is below:

structure(list(SIDE = c("Left", "Right", "Left", "Right", "Left", 
"Right", "Right", "Right", "Right", "Right", "Left", "Left", 
"Left", "Right", "Left", "Right", "Right", "Left", "Left", "Left", 
"Left", "Right", "Right"), PREOP_mTFA = c(163.5, 164.9, 168.7, 
170.3, 162.8, 166.7, 171, 165.9, 165.9, 170.8, 170.5, 173.3, 
167.7, 170.7, 159, 170.9, 168.2, 171.2, 164, 166.6, 169.1, 171.2, 
175.9), PREOP_mLDFA = c(86, 95, 90, 86, 92, 89, 92, 96, 90, 86, 
89, 87, 93, 90, 98, 89, 90, 88, 92, 91, 89, 90, 88)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -23L))

I used the following code to get count and percentage but they come as separate.

DLO_TEST %>% group_by(SIDE) %>% 
  summarise(count = n())
DLO_TEST %>% group_by(SIDE) %>% 
  summarise(percent = 100 * n() / nrow(DLO_TEST))

I want to have one output in the format of count then percentage in parentheses with percentage sign% : count (percentage %). How to modify the code to do that?

I tried

qwraps2::n_perc(DLO_TEST$SIDE == Right) 

This did not work. I think it works only for numeric variables.


Solution

  • Maybe something like this?

    DLO_TEST |>
      summarize(
        count = n(),
        .by = SIDE
      ) |>
      mutate(
        percent = scales::percent_format(accuracy = 0.1)(count / sum(count)),
        your_string = glue::glue("{count} ({percent})")
      )
    # # A tibble: 2 × 4
    #   SIDE  count percent your_string
    #   <chr> <int> <chr>   <glue>     
    # 1 Left     11 47.8%   11 (47.8%) 
    # 2 Right    12 52.2%   12 (52.2%) 
    

    If you have more categorical variables, then this exact format doesn't make sense as they may have different numbers of levels. You could maybe pivot your data to a long format first so they are all in one column. You may also be interested in the table1 package which does a nice job automating tables like this.