Search code examples
rnon-standard-evaluationtidyselect

How can I use tidyselect to pass an array of symbols like pivot_longer?


Suppose I have some data that varies by time and location below

library(tidyverse)

dts <- seq(
  ymd_hms('2023-01-01 00:00:00'),
  ymd_hms('2025-01-01 00:00:00'),
  by = '1 min'
)

locs <- tribble(
  ~'continent', ~'country', ~'city',
  'NA', 'CA', 'Toronto',
  'NA', 'US', 'Los Angeles',
  'EU', 'UK', 'London',
  'EU', 'FR', 'Paris'
)


d <- crossing(locs, dates=dts) %>% 
     mutate(
       second = second(dates),
       min = minute(dates),
       hour = hour(dates),
       yday = yday(dates),
       month = month(dates, label=T),
       y = runif(n())
     )

Created on 2024-02-18 with reprex v2.0.2

My goal is to create a function called rollup where the user can group by time dimensions and location dimensions. Ideally, the user could pass the arguments like

rollup(d, time_dims = c(year, month), loc_dims = c(country))

and the output would be the result of

    d %>%
      group_by(all the variables in loc dims and time dims) %>% 
      summarise(y = mean(y))

If the arguments are character vectors, this is straight forward

rollup <- function(.data, time_dims, loc_dims) {
  
    .data %>% 
      group_by_at(c(time_dims, loc_dims)) %>% 
      summarise(y = mean(y))
  
}

rollup(d, time_dims = c('min', 'hour'), loc_dims = c('country'))
`summarise()` has grouped output by 'min', 'hour'. You can override using the
`.groups` argument.
# A tibble: 5,760 × 4
# Groups:   min, hour [1,440]
     min  hour country     y
   <int> <int> <chr>   <dbl>
 1     0     0 CA      0.496
 2     0     0 FR      0.514
 3     0     0 UK      0.500
 4     0     0 US      0.504
 5     0     1 CA      0.511
 6     0     1 FR      0.509
 7     0     1 UK      0.489
 8     0     1 US      0.505
 9     0     2 CA      0.486
10     0     2 FR      0.484
# ℹ 5,750 more rows
# ℹ Use `print(n = ...)` to see more rows

What if I wanted to pass an array of symbols, like pivot_longer can accept? How can I modify rollup to accept an array of symbols for time_dims and loc_dims?


Solution

  • You can use across() inside group_by() to use tidyselect semantics inside your function:

    library(dplyr)
    
    rollup <- function(.data, time_dims = NULL, loc_dims = NULL) {
      
      .data %>% 
        group_by(across(c({{time_dims}}, {{loc_dims}}))) %>% 
        summarise(y = mean(y), .groups = "drop")
      
    }
    
    rollup(d, time_dims = c(hour, min), loc_dims = country)
    
    # A tibble: 5,760 × 4
        hour   min country     y
       <int> <int> <chr>   <dbl>
     1     0     0 CA      0.487
     2     0     0 FR      0.504
     3     0     0 UK      0.501
     4     0     0 US      0.492
     5     0     1 CA      0.500
     6     0     1 FR      0.512
     7     0     1 UK      0.504
     8     0     1 US      0.491
     9     0     2 CA      0.492
    10     0     2 FR      0.486
    # ℹ 5,750 more rows
    # ℹ Use `print(n = ...)` to see more rows