Search code examples
rdplyrsummarize

dplyr summarise multiple variables based on condition


I would like to summarise data as mean and sd, or median and lower and upper quartile, depending on if they are normally distributed or not.

Using mtcars as an example, this is how I am doing one variable at a time:


sum= mtcars%>%
group_by(am)%>%
summarise(MPG = paste0(mean(qsec), " (", sd(sec), ")")

I'd like to do something like this

norm = c("qsec", "drat", "hp", "mpg")

sum= mtcars%>%
group_by(am)%>%
summarise(across(where(. %in% norm), . = paste0(mean(.,na.rm = T), " (", sd(.,na.rm=T) , ")") )
            )

and add the relevant line for median and quartiles. Would also be happy with a for loop solution and then ? rbind.


Solution

  • I suppose you want to do something like this:

    library("dplyr")
    
    norm <- c("qsec", "drat", "hp", "mpg")
    
    my_summary <- mtcars |>
      group_by(am) |>
      summarise(
        across(
          all_of(norm),
          ~ paste0(mean(.x, na.rm = TRUE), "(sd=", sd(.x, na.rm = TRUE), ")")
        ),
        across(
          !all_of(norm),
          ~ paste0(median(.x, na.rm = TRUE), "(", quantile(.x, 1/4), " - ", quantile(.x, 3/4), ")")
        )
      )
    
    

    You can simply use all_of to select the columns you want from norm or negate it.