The crux of the problem is how to pass in a column variable into a grouped df to conditionally sum data. Data for the example follows:
library(dplyr)
library(rlang)
set.seed(1)
# dummy dates
date_vars <- purrr::map(c('2018-01-31', '2018-02-28', '2018-03-31',
'2018-04-30', '2018-05-31', '2018-06-30',
'2018-07-31', '2018-08-31', '2018-09-30',
'2018-10-31', '2018-11-30', '2018-12-31'), as.Date) %>%
purrr::reduce(c)
dummy_df <- tibble(
id = rep(c("a", "b", "c"), each = 12),
date = rep(date_vars, 3),
value = runif(36, 1, 10)
)
The function below will take a data frame, group by a variable (using rlang's sym function), then create a new summary column by adding all values where the date is greater or equal to some date period. Here I am summing 3 months of 'values'.
agg_by_period <- function(df, date_period, period, grouping, new_col_prefix){
grouping_vars <- syms(grouping)
new_sum_column <- quo_name(paste0(new_col_prefix, "sum_", period, 'm'))
df %>%
group_by(!!!grouping_vars) %>%
summarize(!!new_sum_column := sum(value[date >= date_period], na.rm = T)) %>%
select(!!!grouping_vars, !!sym(new_sum_column))
}
agg_by_period(df = dummy_df,
date_period = as.Date('2018-10-31'),
grouping = 'id',
period = 3,
new_col_prefix = 'new_'
)
# A tibble: 3 x 2
id new_sum_3m
<chr> <dbl>
1 a 7.00
2 b 11.9
3 c 18.1
Great! My question is specific to making 'value' in the function dynamic when this column is named something other than "value". My naive attempt to pass in this column using sym() and its error follows:
agg_by_period2 <- function(df, date_period, period, grouping, new_col_prefix,
value_var){
grouping_vars <- syms(grouping)
new_sum_column = quo_name(paste0(new_col_prefix, "sum_", period, 'm'))
value_var_col <- sym(value_var)
df %>%
group_by(!!!grouping_vars) %>%
summarize(!!new_sum_column := sum(!!value_var_col[date >= date_period], na.rm = T)) %>%
select(!!!grouping_vars, !!sym(new_sum_column))
}
agg_by_period2(df = dummy_df,
date_period = as.Date('2018-10-31'),
grouping = 'id',
period = 3,
new_col_prefix = 'new_',
value_var = 'value'
)
Error in `>=.default`(date, date_period) :
comparison (5) is possible only for atomic and list types
The above function will work when removing the date criteria ([date >= date_period]). Any help would be greatly appreciated.
This appears to be an order-of-operations problem with !!
and [
. Looks like you just need to wrap the splice in parenthesis
df %>%
group_by(!!!grouping_vars) %>%
summarize(!!new_sum_column := sum((!!value_var_col)[date >= date_period], na.rm = T)) %>%
select(!!!grouping_vars, !!sym(new_sum_column))
note the (!!value_var_col)
rather than just !!value_var_col
. This way the splicing will happen before the subsetting.