I would like to make tables for publication that give the number of observations, grouped by two variables. The code for this works fine. However, I have run into problems when trying to turn this into a function.
I am using dplyr_0.7.2
Example using mtcars:
library(tidyverse)
tab1 <- mtcars %>% count(cyl) %>% rename(Total = n)
tab2 <- mtcars %>%
group_by(cyl, gear) %>% count %>%
spread(gear, n)
tab <- full_join(tab1, tab2, by = "cyl")
tab
# This is the output (which is what I want)
A tibble: 3 x 5
cyl Total `3` `4` `5`
<dbl> <int> <int> <int> <int>
1 4 11 1 8 2
2 6 7 2 4 1
3 8 14 12 NA 2
Function for tab1: this works
count_by_two_groups_A <- function(df, var1){
var1 <- enquo(var1)
tab1 <- df %>% count(!!var1) %>% rename(Total = n)
tab1
}
count_by_two_groups_A(mtcars, cyl)
A tibble: 3 x 2
cyl Total
<dbl> <int>
1 4 11
2 6 7
3 8 14
Function for first part of tab2: it works up to this point, but...
count_by_two_groups_B <- function(df, var1, var2){
var1 <- enquo(var1)
var2 <- enquo(var2)
tab2 <- df %>% group_by((!!var1), (!!var2)) %>% count
tab2
}
count_by_two_groups_B(mtcars, cyl, gear)
A tibble: 8 x 3
Groups: (cyl), (gear) [8]
`(cyl)` `(gear)` n
<dbl> <dbl> <int>
1 4 3 1
2 4 4 8
3 4 5 2
4 6 3 2
5 6 4 4
6 6 5 1
7 8 3 12
8 8 5 2
The column names have changed to (cyl) and (gear). I can't seem to figure out how to carry on with spread() and full_join() (or anything else using the new column names) now that the column names have changed. I.e. I can't figure out how to specify the new column names in the tidyeval way, to be able to carry on. I have tried various things, without success.
The usual way of setting names in a tidyeval context is to use the definition operator :=
. It would look like this:
df %>%
group_by(
!! nm1 := !! var1,
!! nm2 := !! var2
) %>%
count()
For this you need to extract nm1
from var1
. Unfortunately I don't have an easy way of stripping down the enclosing parentheses yet. I think it'd make sense to do it in the forthcoming function ensym()
(it captures symbols instead of quosures and issue an error if you supply a call). I have submitted a ticket here: https://github.com/tidyverse/rlang/issues/223
Fortunately we have two easy solutions here. First note that you don't need the enclosing parentheses. They are only needed when other operators are involved in the captured expression. E.g. in these situations:
(!! var) / avg
(!! var) < value
In this case if you omitted parentheses, !!
would try to unquote the whole expressions instead of just the one symbol. On the other hand in your function there is no operator so you can safely unquote without enclosing:
count_by_two_groups_B <- function(df, var1, var2) {
var1 <- enquo(var1)
var2 <- enquo(var2)
df %>%
group_by(!! var1, !! var2) %>%
count()
}
Finally, you could make your function more general by allowing a variable number of arguments. This is even easier to implement because dots are forwarded so there is no need to capture and unquote. Just pass them down to group_by()
:
count_by <- function(df, ...) {
df %>%
group_by(...) %>%
count()
}