Imagine I have the following dataset:
Lines <- "id time sex Age A B C
1 1 male 90 0 0 0
1 2 male 91 0 0 0
1 3 male 92 1 1 0
2 1 female 87 0 1 1
2 2 female 88 0 1 0
2 3 female 89 0 0 1
3 1 male 50 0 1 0
3 2 male 51 1 0 0
3 3 male 52 0 0 0
4 1 female 54 0 1 0
4 2 female 55 0 1 0
4 3 female 56 0 1 0"
I would like to group the data frame in a way that for id, time, sex, and Age I get the first value while for the rest of the variables A B C I get the maximum value.
Lines <- "id time sex Age A B C
1 1 male 90 1 1 0
2 1 female 87 0 1 1
3 1 male 50 1 1 0
4 1 female 54 0 1 0"
So far I have tried:
Lines %>% Lines
summarise(id = first(patient_id), time = first(time), sex = first(sex),
Age = first(Age), vars = max(vars))
I am struggling with an expression to characterize the rest of the variables such as vars
.
You could do
library(dplyr)
Lines %>%
read.table(text = ., header = T) %>%
group_by(id) %>%
summarize(across(c(time, sex, Age), first),
across(-c(time, sex, Age), max))
returning
# A tibble: 4 x 7
id time sex Age A B C
<int> <int> <chr> <int> <int> <int> <int>
1 1 1 male 90 1 1 0
2 2 1 female 87 0 1 1
3 3 1 male 50 1 1 0
4 4 1 female 54 0 1 0