I was reading dplyr's vignette trying to figure out how to use dplyr
in my function codes. Mid way through it talks about how to use enquos
on ...
in order to pass multiple arguments to group_by.
a short example of how it would work
grp <- rlang::enquos(...)
df %>%
group_by(!!!grp)
I didn't know if there was a way to assign multiple expression values without reserving ...
without doing some questionable coding.
To get an idea of what the call would look like use the following example:
#reproducable data
df <- datasets::USJudgeRatings
df$name <- rownames(df)
df <- tidyr::gather(df, key = "key", value = "value", -name)
df$dummy <- c("1","2")
test_summarize <- function(df, sum.col, grp = NULL, filter = NULL) {
filter <- rlang::enquo(filter)
sum.col <- rlang::enquo(sum.col)
if(!is.null(rlang::get_expr(filter))){
df <- dplyr::filter(df, !!filter)
}
#how grp is turned into a character vector to be passed to .dots in group_by
grp <- substitute(grp)
if(!is.null(grp)){
grp <- deparse(grp)
grp <- strsplit(gsub(pattern = "list\\(|c\\(|\\)|", replacement = "", x = grp), split =",")[[1]]
grp <- gsub(pattern = "^ | $", replacement = "", x = grp)
df %>%
dplyr::group_by(.dots=grp) %>%
dplyr::summarise(mean = mean(!!sum.col), sum = sum(!!sum.col), n = n())
} else{
df %>%
dplyr::summarise(mean = mean(!!sum.col), sum = sum(!!sum.col), n = n())
}
}
test_summarize(df, sum.col=value, grp = c(name, dummy))
# A tibble: 86 x 5
# Groups: name [?]
name dummy mean sum n
<chr> <fct> <dbl> <dbl> <int>
1 AARONSON,L.H. 1 7.17 43 6
2 AARONSON,L.H. 2 7.42 44.5 6
3 ALEXANDER,J.M. 1 8.35 50.1 6
4 ALEXANDER,J.M. 2 7.95 47.7 6
5 ARMENTANO,A.J. 1 7.53 45.2 6
6 ARMENTANO,A.J. 2 7.7 46.2 6
7 BERDON,R.I. 1 8.67 52 6
8 BERDON,R.I. 2 8.25 49.5 6
9 BRACKEN,J.J. 1 5.65 33.9 6
10 BRACKEN,J.J. 2 5.82 34.9 6
# ... with 76 more rows
This works for what I was trying to do, but I was wondering if there was a better way to accept the arguments and handle them. Every attempt I made in turning the original grp
call into something that resembles what enquos(...)
failed so I did a deparsing and turned them into a character vector, which honestly I should probably just expect the user to pass characters?
I am opting to not use a character vector as the expected input because I was trying to remain consistent considering that sum.col and filter arguments of the function expect NSE expressions. Maybe there is something in the rlang package that will convert each element of the original expression into a list of quosures?
Edit: fixed reproducible example and provided expected output
If we use group_by_at
, we may not need the if/else
argument
test_summarize <- function(df, sum.col, grp = NULL, filter = NULL) {
df %>%
group_by_at(grp) %>%
summarise(mean = mean({{sum.col}}),
sum = sum({{sum.col}}), n = n())
}
test_summarize(df, sum.col=value, grp = c("name", "dummy"))
# A tibble: 86 x 5
# Groups: name [43]
# name dummy mean sum n
# <chr> <chr> <dbl> <dbl> <int>
# 1 AARONSON,L.H. 1 7.17 43 6
# 2 AARONSON,L.H. 2 7.42 44.5 6
# 3 ALEXANDER,J.M. 1 8.35 50.1 6
# 4 ALEXANDER,J.M. 2 7.95 47.7 6
# 5 ARMENTANO,A.J. 1 7.53 45.2 6
# 6 ARMENTANO,A.J. 2 7.7 46.2 6
# 7 BERDON,R.I. 1 8.67 52 6
# 8 BERDON,R.I. 2 8.25 49.5 6
# 9 BRACKEN,J.J. 1 5.65 33.9 6
#10 BRACKEN,J.J. 2 5.82 34.9 6
# … with 76 more rows
test_summarize(df, sum.col=value)
# A tibble: 1 x 3
# mean sum n
# <dbl> <dbl> <int>
#1 7.57 3908. 516
which is the same as
df %>%
summarise(mean = mean(value), sum = sum(value), n = n())
# mean sum n
#1 7.57345 3907.9 516
If we use filter
, then one option is ...
and pass as many filter conditions
test_summarize <- function(df, sum.col, grp = NULL, ...) {
df %>%
filter(!!! rlang::enexprs(...)) %>%
group_by_at(grp) %>%
summarise(mean = mean({{sum.col}}), sum = sum({{sum.col}}), n = n())
}
test_summarize(df, sum.col=value, grp = c("name", "dummy"),
key %in% c("CONT", "INTG"), value > 6.5)
# A tibble: 77 x 5
# Groups: name [43]
# name dummy mean sum n
# <chr> <chr> <dbl> <dbl> <int>
# 1 AARONSON,L.H. 2 7.9 7.9 1
# 2 ALEXANDER,J.M. 1 8.9 8.9 1
# 3 ALEXANDER,J.M. 2 6.8 6.8 1
# 4 ARMENTANO,A.J. 1 7.2 7.2 1
# 5 ARMENTANO,A.J. 2 8.1 8.1 1
# 6 BERDON,R.I. 1 8.8 8.8 1
# 7 BERDON,R.I. 2 6.8 6.8 1
# 8 BRACKEN,J.J. 1 7.3 7.3 1
# 9 BURNS,E.B. 1 8.8 8.8 1
#10 CALLAHAN,R.J. 1 10.6 10.6 1
# … with 67 more rows
and this will also evaluate when there are no filter arguments
test_summarize(df, sum.col=value, grp = c("name", "dummy"))
# A tibble: 86 x 5
# Groups: name [43]
# name dummy mean sum n
# <chr> <chr> <dbl> <dbl> <int>
# 1 AARONSON,L.H. 1 7.17 43 6
# 2 AARONSON,L.H. 2 7.42 44.5 6
# 3 ALEXANDER,J.M. 1 8.35 50.1 6
# 4 ALEXANDER,J.M. 2 7.95 47.7 6
# 5 ARMENTANO,A.J. 1 7.53 45.2 6
# 6 ARMENTANO,A.J. 2 7.7 46.2 6
# 7 BERDON,R.I. 1 8.67 52 6
# 8 BERDON,R.I. 2 8.25 49.5 6
# 9 BRACKEN,J.J. 1 5.65 33.9 6
#10 BRACKEN,J.J. 2 5.82 34.9 6
# … with 76 more rows
which is the same as thee first output