In my code, which uses dplyr
, I often perform certain operations on a dataframe variable (here assumed to be simply multiplication by 2, to simplify the MRE), optionally group on another variable, and then select
only some of the resulting variables. To prevent code duplication, I want to write a function.
The test dataframe is
library(ggplot2)
msleep_mini <- msleep[1:10, ]
The function must reproduce the following behavior. If called with a single argument, say, sleep_total
, it simply multiplies sleep_total
by 2, and returns a dataframe containing the columns name
, vore
, order
and sleep_total
:
# test_1
msleep_mini %>%
group_double_select(sleep_total)
#> # A tibble: 20 x 4
#> name vore order sleep_total
#> <chr> <chr> <chr> <dbl>
#> 1 Cheetah carni Carnivora 24.2
#> 2 Owl monkey omni Primates 34
#> 3 Mountain beaver herbi Rodentia 28.8
#> 4 Greater short-tailed shrew omni Soricomorpha 29.8
#> 5 Cow herbi Artiodactyla 8
#> 6 Three-toed sloth herbi Pilosa 28.8
#> 7 Northern fur seal carni Carnivora 17.4
#> 8 Vesper mouse <NA> Rodentia 14
#> 9 Dog carni Carnivora 20.2
#> 10 Roe deer herbi Artiodactyla 6
If called with two arguments, the second one is interpreted as a grouping variable. Again, the first one is multiplied by 2, but now the dataframe is also grouped by the second argument, sorted according to it, and finally a id
column, containing the progressive row number inside each group, is added to the dataframe. In other words, the output would be
# test_2
msleep_mini %>%
group_double_select(sleep_total, vore)
#> # A tibble: 20 x 5
#> # Groups: vore [4]
#> vore name order sleep_total id
#> <chr> <chr> <chr> <dbl> <int>
#> 1 carni Cheetah Carnivora 24.2 1
#> 2 carni Northern fur seal Carnivora 17.4 2
#> 3 carni Dog Carnivora 20.2 3
#> 4 carni Long-nosed armadillo Cingulata 34.8 4
#> 5 herbi Mountain beaver Rodentia 28.8 1
#> 6 herbi Cow Artiodactyla 8 2
#> 7 herbi Three-toed sloth Pilosa 28.8 3
#> 8 herbi Roe deer Artiodactyla 6 4
#> 9 herbi Goat Artiodactyla 10.6 5
#> 10 herbi Guinea pig Rodentia 18.8 6
Of course, the function must work with arbitrary variables (as long as they can be found in the dataframe):
# test_3
msleep_mini %>%
group_double_select(sleep_rem, order)
#> # A tibble: 20 x 5
#> # Groups: order [9]
#> order name vore sleep_rem id
#> <chr> <chr> <chr> <dbl> <int>
#> 1 Artiodactyla Cow herbi 1.4 1
#> 2 Artiodactyla Roe deer herbi NA 2
#> 3 Artiodactyla Goat herbi 1.2 3
#> 4 Carnivora Cheetah carni NA 1
#> 5 Carnivora Northern fur seal carni 2.8 2
#> 6 Carnivora Dog carni 5.8 3
#> 7 Cingulata Long-nosed armadillo carni 6.2 1
#> 8 Didelphimorphia North American Opossum omni 9.8 1
#> 9 Hyracoidea Tree hyrax herbi 1 1
#> 10 Pilosa Three-toed sloth herbi 4.4 1
It seems to me that the only way to write group_double_select
in a robust and maintainable way is to use tidy evaluation, but I may be wrong. Can you help me?
We can use missing
to check whether the argument is missing in the function
group_double_select <- function(data, colVar, groupVar) {
colVar <- enquo(colVar)
if(missing(groupVar)) {
data %>%
select(name, vore, order, !!colVar) %>%
mutate(!! quo_name(colVar) := !! colVar * 2)
} else {
groupVar <- enquo(groupVar)
data %>%
select(name, vore, order, !!colVar) %>%
mutate(!! quo_name(colVar) := !! colVar * 2) %>%
group_by(!! groupVar) %>%
mutate(id = row_number()) %>%
arrange(!! groupVar)
}
}
-testing
msleep_mini %>%
group_double_select(sleep_total, vore) %>%
head
# A tibble: 6 x 5
# Groups: vore [2]
# name vore order sleep_total id
# <chr> <chr> <chr> <dbl> <int>
#1 Cheetah carni Carnivora 24.2 1
#2 Northern fur seal carni Carnivora 17.4 2
#3 Dog carni Carnivora 20.2 3
#4 Long-nosed armadillo carni Cingulata 34.8 4
#5 Mountain beaver herbi Rodentia 28.8 1
#6 Cow herbi Artiodactyla 8 2
msleep_mini %>%
group_double_select(sleep_total) %>%
head
# A tibble: 6 x 4
# name vore order sleep_total
# <chr> <chr> <chr> <dbl>
#1 Cheetah carni Carnivora 24.2
#2 Owl monkey omni Primates 34
#3 Mountain beaver herbi Rodentia 28.8
#4 Greater short-tailed shrew omni Soricomorpha 29.8
#5 Cow herbi Artiodactyla 8
#6 Three-toed sloth herbi Pilosa 28.8
msleep_mini %>%
group_double_select(sleep_rem, order) %>%
head
# A tibble: 6 x 5
# Groups: order [2]
# name vore order sleep_rem id
# <chr> <chr> <chr> <dbl> <int>
#1 Cow herbi Artiodactyla 1.4 1
#2 Roe deer herbi Artiodactyla NA 2
#3 Goat herbi Artiodactyla 1.2 3
#4 Cheetah carni Carnivora NA 1
#5 Northern fur seal carni Carnivora 2.8 2
#6 Dog carni Carnivora 5.8 3