I have the following dataset:
Lines <- "id time sex Age Obs_A Obs_B Obs_C
1 1 male 90 0 0 0
1 2 male 91 0 0 0
1 3 male 92 1 1 0
2 1 female 87 0 1 1
2 2 female 88 0 1 0
2 3 female 89 0 0 1
3 1 male 50 0 1 0
3 2 male 51 1 0 0
3 3 male 52 0 0 0
4 1 female 54 0 1 0
4 2 female 55 0 1 0
4 3 female 56 0 1 0"
I want to combine summarize
with regular expressions (grepl
) in order to reformat the variables that start with Obs
(e.g. take the median) while do other operations for other variables. For example something like this:
TTE <- TTE %>%
group_by(id, across(where(is.character))) %>%
summarise(id = first(id), sex = first(sex),
Age = mean(Age), across(where(grepl("Obs")), mean), across(where(is.numeric), max)) %>%
ungroup
Nonetheless, I get the following error:
x argument "x" is missing, with no default
Any idea on how to use summarize()
, across()
, where()
and grepl()
in a consistent way?
For dplyr
you can use tidyselect function to select columns in across
.
library(dplyr)
TTE %>%
group_by(id, across(where(is.character))) %>%
summarise(Age = mean(Age),
across(starts_with('Obs'), mean),
across(where(is.numeric), max)) %>%
ungroup
# id sex Age Obs_A Obs_B Obs_C time
# <int> <chr> <dbl> <dbl> <dbl> <dbl> <int>
#1 1 male 91 0.333 0.333 0 3
#2 2 female 88 0 0.667 0.667 3
#3 3 male 51 0.333 0.333 0 3
#4 4 female 55 0 1 0 3
Since you are grouping by all the character columns you don't need to include them in across
.