Search code examples
rdataframeaggregate

Calculate the percentage of a value per group?


I have this data frame

df.bar <- data.frame(diagnosis = c("A","A","A", "nb" ,"nb", "hg"),
  C1 = c(1,1,0,0,1,0), C2 = c(0,1,0,0,0,0))


    df.bar
   diagnosis C1 C2
   1         A  1  0
   2         A  1  1
   3         A  0  0
   4        nb  0  0
   5        nb  1  0
   6        hg  0  0

I want to calculate the percentage of "one" for each diagnosis as follows:

   diagnosis C1 C2
   1        A  66%  33%       
   2        nb  50%  0%
   3        hg  0%  0%

Solution

    1. base solution with aggregate():
    aggregate(cbind(C1, C2) ~ diagnosis, df.bar,
              \(x) paste0(round(mean(x) * 100, 2), '%'))
    
    1. dplyr solution:
    library(dplyr)
    
    df.bar %>%
      group_by(diagnosis) %>%
      summarise(across(C1:C2, ~ paste0(round(mean(.x) * 100, 2), '%')))
    
    # # A tibble: 3 × 3
    #   diagnosis C1     C2
    #   <chr>     <chr>  <chr>
    # 1 A         66.67% 33.33%
    # 2 hg        0%     0%
    # 3 nb        50%    0%