Search code examples
rdplyrrlangquasiquotes

using `rlang` quasiquotation with `dplyr::_join` functions


I am trying to write a custom function where I use rlang's quasiquotation. This function also internally uses dplyr's join functions. I have provided below a minimal working example that illustrated my problem.

# needed libraries 
library(tidyverse)

# function definition
df_combiner <- function(data, x, group.by) {
  # check how many variables were entered for this grouping variable
  group.by <- as.list(rlang::quo_squash(rlang::enquo(group.by)))

  # based on number of arguments, select `group.by` in cases like `c(cyl)`,
  # the first list element after `quo_squash` will be `c` which we don't need,
  # but if we pass just `cyl`, there is no `c`, this will take care of that
  # issue
  group.by <-
    if (length(group.by) == 1) {
      group.by
    } else {
      group.by[-1]
    }

  # creating internal dataframe
  df <- dplyr::group_by(.data = data, !!!group.by, .drop = TRUE)

  # creating dataframes to be joined: one with tally, one with summary
  df_tally <- dplyr::tally(df)
  df_mean <- dplyr::summarise(df, mean = mean({{ x }}, na.rm = TRUE))

  # without specifying `by` argument, this works but prints a message I want to avoid
  print(dplyr::left_join(x = df_tally, y = df_mean))

  # joining by specifying `by` argument (my failed attempt)
  dplyr::left_join(x = df_tally, y = df_mean, by = !!!group.by)
}

# using the function
df_combiner(diamonds, carat, c(cut, clarity))

#> Joining, by = c("cut", "clarity")

#> # A tibble: 40 x 4
#> # Groups:   cut [5]
#>    cut   clarity     n  mean
#>    <ord> <ord>   <int> <dbl>
#>  1 Fair  I1        210 1.36 
#>  2 Fair  SI2       466 1.20 
#>  3 Fair  SI1       408 0.965
#>  4 Fair  VS2       261 0.885
#>  5 Fair  VS1       170 0.880
#>  6 Fair  VVS2       69 0.692
#>  7 Fair  VVS1       17 0.665
#>  8 Fair  IF          9 0.474
#>  9 Good  I1         96 1.20 
#> 10 Good  SI2      1081 1.04 
#> # ... with 30 more rows

#> Error in !group.by: invalid argument type

As can be seen here, I want to avoid the message #> Joining, by = c("cut", "clarity") and so explicitly want to input the by argument for the _join function but I am not sure how to do this. (I've tried rlang::as_string, rlang::quo_name, etc.).


Solution

  • We can convert to string with as_string

    dplyr::left_join(x = df_tally, y = df_mean,
                by = map_chr(group.by, rlang::as_string))
    

    df_combiner <- function(data, x, group.by) {
      # check how many variables were entered for this grouping variable
      group.by <- as.list(rlang::quo_squash(rlang::enquo(group.by)))
    
      # based on number of arguments, select `group.by` in cases like `c(cyl)`,
      # the first list element after `quo_squash` will be `c` which we don't need,
      # but if we pass just `cyl`, there is no `c`, this will take care of that
      # issue
      group.by <-
        if (length(group.by) == 1) {
          group.by
        } else {
          group.by[-1]
        }
    
      # creating internal dataframe
      df <- dplyr::group_by(.data = data, !!!group.by, .drop = TRUE)
    
      # creating dataframes to be joined: one with tally, one with summary
      df_tally <- dplyr::tally(df)
      df_mean <- dplyr::summarise(df, mean = mean({{ x }}, na.rm = TRUE))
    
      # without specifying `by` argument, this works but prints a message I want to avoid
      #print(dplyr::left_join(x = df_tally, y = df_mean))
    
      # joining by specifying `by` argument (my failed attempt)
       dplyr::left_join(x = df_tally, y = df_mean, by = map_chr(group.by, rlang::as_string))
    
    }
    

    -checking

    df_combiner(diamonds, carat, c(cut, clarity))
    # A tibble: 40 x 4
    # Groups:   cut [5]
    #   cut   clarity     n  mean
    #   <ord> <ord>   <int> <dbl>
    # 1 Fair  I1        210 1.36 
    # 2 Fair  SI2       466 1.20 
    # 3 Fair  SI1       408 0.965
    # 4 Fair  VS2       261 0.885
    # 5 Fair  VS1       170 0.880
    # 6 Fair  VVS2       69 0.692
    # 7 Fair  VVS1       17 0.665
    # 8 Fair  IF          9 0.474
    # 9 Good  I1         96 1.20 
    #10 Good  SI2      1081 1.04 
    # … with 30 more rows