Search code examples
rgreplsummarize

R: Using Summarize() accross() and where() with regular expressions


I have the following dataset:

Lines <- "id time sex Age Obs_A Obs_B Obs_C
1  1       male   90 0 0 0
1  2       male   91 0 0 0
1  3       male   92 1 1 0
2  1       female  87 0 1 1
2  2       female  88 0 1 0
2  3       female  89 0 0 1
3  1       male  50 0 1 0
3  2       male  51 1 0 0
3  3       male  52 0 0 0
4  1       female  54 0 1 0
4  2       female  55 0 1 0
4  3       female  56 0 1 0"

I want to combine summarize with regular expressions (grepl) in order to reformat the variables that start with Obs (e.g. take the median) while do other operations for other variables. For example something like this:

TTE <- TTE %>%
      group_by(id, across(where(is.character))) %>%
      summarise(id = first(id), sex = first(sex), 
                Age = mean(Age), across(where(grepl("Obs")), mean), across(where(is.numeric), max)) %>%
      ungroup 

Nonetheless, I get the following error:

x argument "x" is missing, with no default

Any idea on how to use summarize(), across(), where() and grepl() in a consistent way?


Solution

  • For dplyr you can use tidyselect function to select columns in across.

    library(dplyr)
    
    TTE %>%
      group_by(id, across(where(is.character))) %>%
      summarise(Age = mean(Age), 
                across(starts_with('Obs'), mean), 
                across(where(is.numeric), max)) %>%
      ungroup 
    
    #     id sex      Age Obs_A Obs_B Obs_C  time
    #  <int> <chr>  <dbl> <dbl> <dbl> <dbl> <int>
    #1     1 male      91 0.333 0.333 0         3
    #2     2 female    88 0     0.667 0.667     3
    #3     3 male      51 0.333 0.333 0         3
    #4     4 female    55 0     1     0         3
    

    Since you are grouping by all the character columns you don't need to include them in across.