Search code examples
rtidyversesummarize

Summarise multiple columns that have to be grouped tidyverse


I have a data frame containing data that looks something like this:

df <- data.frame(
    group1 = c("High","High","High","Low","Low","Low"),
    group2 = c("male","female","male","female","male","female"),
    one = c("yes","yes","yes","yes","no","no"), 
    two = c("no","yes","no","yes","yes","yes"), 
    three = c("yes","no","no","no","yes","yes")
)

I want to summarise the counts of yes/no in the variables one, two, and three which normally I would do by df %>% group_by(group1,group2,one) %>% summarise(n()). Is there any way that I can summarise all three columns and then bind them all into one output df without having to manually perform the code over each column? I've tried using for loop but I can't get the group_by() to recognize the colname I am giving it as input


Solution

  • Get the data in long format and count :

    library(dplyr)
    library(tidyr)
    
    df %>% pivot_longer(cols = one:three) %>% count(group1, group2, value)
    
    #  group1 group2 value     n
    #  <chr>  <chr>  <chr> <int>
    #1 High   female no        1
    #2 High   female yes       2
    #3 High   male   no        3
    #4 High   male   yes       3
    #5 Low    female no        2
    #6 Low    female yes       4
    #7 Low    male   no        1
    #8 Low    male   yes       2