Search code examples
rfunctionjoindplyrquasiquotes

How to pass more than one environmental variable name to `by` parameter in dplyr `join` functions?


Let's say I'm writing a wrapper function for full_join (but the question applies to left_join, right_join, and inner_join). The join variables need to be specified by the user in the function call and the function needs to allow for situations when the join variables had different names.

here is the example outside of a function given in the help pages.

full_join(band_members, band_instruments2, by = c("name" = "artist"))

Here is the function that I expected to work:

join_wrapper <- function(data1 = band_members, by1, by2) {
  by1 <- enquo(by1)
  by2 <- enquo(by2)
  
  data2 <- band_instruments2
  
  full_join(data1, data2, by = c(quo_name(by1) = quo_name(by2)) )
}

However, I get this error message: Error: unexpected '=' in: "full_join(data1, band_instruments2, by = c(quo_name(by1) ="

I've also tried as_name() instead of quo_name(), and creating a separate character vector containing the join-by information to pass to by in full_join() like this:

join_wrapper <- function(data1, by1, by2) {
  by1 <- enquo(by1)
  by2 <- enquo(by2)

  data2 <- band_instruments2
  
  .by = c( quo_name(by1) = quo_name(by2) )
  
  full_join(data1, data2, by = .by )
}

All this does is shift the error from the full_join call to the .by = call. I even tried using set_names() as suggested in this answer: https://stackoverflow.com/a/49342789/9154433, but I still get the same error message.

Interestingly, this works if the join variables have the same name as in:

join_wrapper <- function(data1, by1) {
  by1 <- enquo(by1)

  data2 <- band_instruments
  
  full_join( data1, data2, by = quo_name(by1) )
}

join_wrapper(band_members, name)

In my real-world problem, data2 is created within the function, so an alternative solution would involve dynamically naming the join_by variable to match the join_by variable in data1. However, I'm not using dynamic dots, so the := function doesn't work and using "assign" within the mutate call as in: mutate(band_instruments2, assign(quo_name(by1), artist)) returns the new column name as assign(quo_name(by1), artist).


Solution

  • We can use the curly curly operator inside join_by():

    library(dplyr)
    
    join_wrapper <- function(data1 = band_members, by1, by2) {
    
      data2 <- band_instruments2
      
      full_join(data1, data2, by = join_by({{ by1 }} == {{ by2 }}))
    }
    
    join_wrapper(by1 = name, by2 = artist)
    #> # A tibble: 4 × 3
    #>   name  band    plays 
    #>   <chr> <chr>   <chr> 
    #> 1 Mick  Stones  <NA>  
    #> 2 John  Beatles guitar
    #> 3 Paul  Beatles bass  
    #> 4 Keith <NA>    guitar
    

    Created on 2023-03-08 with reprex v2.0.2