Search code examples
rdplyrsummarize

How do I get the percentage of 4 columns of yes/no data in R?


Let us say, I have a data set with the following columns: Columns with "Yes" and "No" Responses

Dataset: (https://docs.google.com/spreadsheets/d/1TLw-UG8WOlFQ3dCn4Kmdok6I2rJL3M31oYb4_a_AxjU/edit?usp=sharing)

I would like to have the final output of the data to be as show below enter image description here

open_data_portals <- ogd_research_project_csv_col_names_clean %>%

  dplyr::select(Opd_InstitutionWebsite,
                Opd_Portal,
                Opd_MobileApps,
                Opd_CustomerServicePortal) %>%

  dplyr::group_by(Opd_InstitutionWebsite,
                  Opd_Portal,
                  Opd_MobileApps,
                  Opd_CustomerServicePortal) %>% 
  
dplyr::summarise()

I have tried the above but am stuck at summarise. How can I proceed, or how can I solve this challenge?


Solution

  • df |>
      pivot_longer(everything())|>
      mutate(t = n(), .by = name) |>
      summarise(n = (n() / first(t)*100) %>% round() %>% paste0("%"), .by = c(name, value))|>  
      pivot_wider(names_from = value, values_from = n)
    

    Output:

    # A tibble: 4 × 3
      name  No    Yes  
      <chr> <chr> <chr>
    1 a     52%   48%  
    2 b     47%   53%  
    3 c     53%   47%  
    4 d     41%   59%  
    

    Creating the sample df:

    set.seed(0)
    library(tidyverse)
    
    df <- tibble(
        a  = sample(c("Yes", "No"), 123, replace = TRUE),
        b = sample(c("Yes", "No"), 123, replace = TRUE),
        c = sample(c("Yes", "No"), 123, replace = TRUE),
        d = sample(c("Yes", "No"), 123, replace = TRUE))