Suppose I have some data that varies by time and location below
library(tidyverse)
dts <- seq(
ymd_hms('2023-01-01 00:00:00'),
ymd_hms('2025-01-01 00:00:00'),
by = '1 min'
)
locs <- tribble(
~'continent', ~'country', ~'city',
'NA', 'CA', 'Toronto',
'NA', 'US', 'Los Angeles',
'EU', 'UK', 'London',
'EU', 'FR', 'Paris'
)
d <- crossing(locs, dates=dts) %>%
mutate(
second = second(dates),
min = minute(dates),
hour = hour(dates),
yday = yday(dates),
month = month(dates, label=T),
y = runif(n())
)
Created on 2024-02-18 with reprex v2.0.2
My goal is to create a function called rollup
where the user can group by time dimensions and location dimensions. Ideally, the user could pass the arguments like
rollup(d, time_dims = c(year, month), loc_dims = c(country))
and the output would be the result of
d %>%
group_by(all the variables in loc dims and time dims) %>%
summarise(y = mean(y))
If the arguments are character vectors, this is straight forward
rollup <- function(.data, time_dims, loc_dims) {
.data %>%
group_by_at(c(time_dims, loc_dims)) %>%
summarise(y = mean(y))
}
rollup(d, time_dims = c('min', 'hour'), loc_dims = c('country'))
`summarise()` has grouped output by 'min', 'hour'. You can override using the
`.groups` argument.
# A tibble: 5,760 × 4
# Groups: min, hour [1,440]
min hour country y
<int> <int> <chr> <dbl>
1 0 0 CA 0.496
2 0 0 FR 0.514
3 0 0 UK 0.500
4 0 0 US 0.504
5 0 1 CA 0.511
6 0 1 FR 0.509
7 0 1 UK 0.489
8 0 1 US 0.505
9 0 2 CA 0.486
10 0 2 FR 0.484
# ℹ 5,750 more rows
# ℹ Use `print(n = ...)` to see more rows
What if I wanted to pass an array of symbols, like pivot_longer
can accept? How can I modify rollup
to accept an array of symbols for time_dims
and loc_dims
?
You can use across()
inside group_by()
to use tidyselect
semantics inside your function:
library(dplyr)
rollup <- function(.data, time_dims = NULL, loc_dims = NULL) {
.data %>%
group_by(across(c({{time_dims}}, {{loc_dims}}))) %>%
summarise(y = mean(y), .groups = "drop")
}
rollup(d, time_dims = c(hour, min), loc_dims = country)
# A tibble: 5,760 × 4
hour min country y
<int> <int> <chr> <dbl>
1 0 0 CA 0.487
2 0 0 FR 0.504
3 0 0 UK 0.501
4 0 0 US 0.492
5 0 1 CA 0.500
6 0 1 FR 0.512
7 0 1 UK 0.504
8 0 1 US 0.491
9 0 2 CA 0.492
10 0 2 FR 0.486
# ℹ 5,750 more rows
# ℹ Use `print(n = ...)` to see more rows