Search code examples
rtidyevalquosure

R Quasiquotation & tidyeval for dynamic variable references in R in own functions


I'm trying to get my head around using quasiquotation from the tidyverse in R in my own functions. I've read this one here: Passing a list of arguments to a function with quasiquotation and the whole thing here: https://tidyeval.tidyverse.org/

But I still don't get it to work.

Assume I have the following data:

dat <- data.frame(time   = runif(20),
                  group1 = rep(1:2, times = 10),
                  group2 = rep(1:2, each = 10),
                  group3 = rep(3:4, each = 10))

What I want to do now is to write a function that does the following:

  • take a data set
  • specify the variable that contains the time (note, in another data set this might be called "hours" or "qtime" or whatever)
  • specify by which groups I want to do operations/statistics on

So what I want the user to do is to use a function like:

test_function(data = dat, time_var = "time", group_vars = c("group1", "group3")) Note, I might choose different grouping variables or none next time.

Let's say within the function I want to:

  • calculate certain statistics on the time variable, e.g. the quantiles. Note: I want to split this up by my grouping variables

Here's one of my latest tries:

test_function <- function(data, time_var = NULL, group_vars = NULL)
{
# Note I initialize the variables with NULL, since e.g. the user might not specify a grouping 

and I want to check for that in my function at some point)
time_var <- enquo(time_var)
group_vars <- enquos(group_vars)

# Here I try to group by my grouping variables
temp_data <- data %>%
    group_by_at(group_vars) %>%
    mutate(!!sym(time_var) := !!sym(time_var) / 60)

# Here I'm calculating some stats  
time_stats <- temp_data %>%
    summarize_at(vars(!!time_var), list(p0.1_time   = ~quantile(., probs = 0.1, na.rm = T),
                                        p0.2_time   = ~quantile(., probs = 0.2, na.rm = T),
                                        p0.3_time   = ~quantile(., probs = 0.3, na.rm = T),
                                        p0.4_time   = ~quantile(., probs = 0.4, na.rm = T),
                                        p0.5_time   = ~quantile(., probs = 0.5, na.rm = T),
                                        p0.6_time   = ~quantile(., probs = 0.6, na.rm = T),
                                        p0.7_time   = ~quantile(., probs = 0.7, na.rm = T),
                                        p0.8_time   = ~quantile(., probs = 0.8, na.rm = T),
                                        p0.9_time   = ~quantile(., probs = 0.9, na.rm = T),
                                        p0.95_time  = ~quantile(., probs = 0.95, na.rm = T)))

}

What is wrong with my code? I.e. I specifically struggle with the !!, !!!, sym, enquo, enquos things. Why does the group_by_at thing doesn't need the !! thing, whereas my summarize and mutate do need it?


Solution

  • Make these changes:

    • use sym and syms rather than enquo and enquos
    • use !! and !!! respectively.
    • createpo as a list and then use unnest_wider to expand into columns
    • quantile is already vectorized so we don't need map
    • the mutate can be incorporated right into the quantile call eliminating it
    • consolidate the pipelines into a single pipeline
    • use TRUE rather than T since the latter can be masked by a variable of that name whereas no variable may be called TRUE.
    • we can use plain group_by and summarize
    • there is no group3 in the sample data so we used group2 instead
    • this does not make sense without time_var so remove the default of NULL

    This gives the following code

    test_function <- function(data, time_var, group_vars = NULL) {
      p <- c(1:9/10, 0.95)
      time_var <- sym(time_var)
      group_vars <- syms(group_vars)
      data %>%
        group_by(!!!group_vars) %>%
        summarize(po = list(quantile(!!time_var / 60, p, na.rm = TRUE))) %>%
        ungroup %>%
        unnest_wider(po)
    }
    
    test_function(data = dat, time_var = "time", group_vars = c("group1", "group2")) 
    

    giving:

    # A tibble: 4 x 12
      group1 group2   `10%`   `20%`   `30%`   `40%`   `50%`   `60%`   `70%`   `80%`   `90%`   `95%`
       <int>  <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
    1      1      1 0.00237 0.00432 0.00654 0.00903 0.0115  0.0120  0.0124  0.0133  0.0147  0.0154 
    2      1      2 0.00244 0.00251 0.00281 0.00335 0.00388 0.00410 0.00432 0.00493 0.00591 0.00640
    3      2      1 0.00371 0.00381 0.00468 0.00632 0.00796 0.0101  0.0122  0.0136  0.0143  0.0147 
    4      2      2 0.00385 0.00538 0.00630 0.00660 0.00691 0.00725 0.00759 0.00907 0.0117  0.0130