Search code examples
rdata-manipulationsummarize

How do I count the percentage makeup of TRUES in a table?


This is the code used to derive the first table in my question.

JH %>% group_by(ATT_ID, CAR=="B") %>%
summarize(count = n(), .groups = "drop")
ATT_ID CAR == "B" Count
ONE FALSE 1
TWO TRUE 1
THREE TRUE 3
THREE FALSE 5
FOUR FALSE 2
FIVE TRUE 4
SIX TRUE 8
SIX FALSE 4

How can I get the table above to look like:

ATT_ID Percentage of "B"
ONE 0%
TWO 100%
THREE 37.5%
FOUR 0%
FIVE 100%
SIX 67%
  • Notice how some ID's are seen twice so as to show the presence of both FALSE & TRUE whereas other ID's appear once to showcase the presence of only one or the other.

Thank you


Solution

  • Grouped by 'ATT_ID', get the sum of Count where CAR=="B" is TRUE and divide by the sum of full Count

    library(dplyr)
    df1 %>%
       group_by(ATT_ID = factor(ATT_ID, levels = unique(ATT_ID))) %>%
      summarise(Percentage_of_B = paste0(round(
          sum(Count[`CAR == "B"`])/sum(Count) * 100, 1), "%"))
    

    -output

    # A tibble: 6 × 2
      ATT_ID Percentage_of_B
      <fct>  <chr>          
    1 ONE    0%             
    2 TWO    100%           
    3 THREE  37.5%          
    4 FOUR   0%             
    5 FIVE   100%           
    6 SIX    66.7%     
    

    data

    df1 <- structure(list(ATT_ID = c("ONE", "TWO", "THREE", "THREE", "FOUR", 
    "FIVE", "SIX", "SIX"), `CAR == "B"` = c(FALSE, TRUE, TRUE, FALSE, 
    FALSE, TRUE, TRUE, FALSE), Count = c(1L, 1L, 3L, 5L, 2L, 4L, 
    8L, 4L)), class = "data.frame", row.names = c(NA, -8L))