Search code examples
rsummarize

R: how to summarise several variables with different expressions and then one expression for the rest


Imagine I have the following dataset:

Lines <- "id time sex Age A B C
1  1       male   90 0 0 0
1  2       male   91 0 0 0
1  3       male   92 1 1 0
2  1       female  87 0 1 1
2  2       female  88 0 1 0
2  3       female  89 0 0 1
3  1       male  50 0 1 0
3  2       male  51 1 0 0
3  3       male  52 0 0 0
4  1       female  54 0 1 0
4  2       female  55 0 1 0
4  3       female  56 0 1 0"

I would like to group the data frame in a way that for id, time, sex, and Age I get the first value while for the rest of the variables A B C I get the maximum value.

Lines <- "id time sex Age A B C
1  1       male   90 1 1 0
2  1       female  87 0 1 1
3  1       male  50 1 1 0
4  1       female  54 0 1 0"

So far I have tried:

Lines %>% Lines
   summarise(id = first(patient_id), time = first(time), sex = first(sex), 
   Age = first(Age), vars = max(vars))

I am struggling with an expression to characterize the rest of the variables such as vars.


Solution

  • You could do

    library(dplyr)
    
    Lines %>%
      read.table(text = ., header = T) %>%
      group_by(id) %>%
      summarize(across(c(time, sex, Age), first),
                across(-c(time, sex, Age), max))
    

    returning

    # A tibble: 4 x 7
         id  time sex      Age     A     B     C
      <int> <int> <chr>  <int> <int> <int> <int>
    1     1     1 male      90     1     1     0
    2     2     1 female    87     0     1     1
    3     3     1 male      50     1     1     0
    4     4     1 female    54     0     1     0