Search code examples
rleft-jointidyverseellipsistidyeval

Passing names of objects from ellipsis as strings to left_join


Background

I have a simple helper function that applies left_join to any number of passed tables in other to gather them and return one object.

Example

# Settings ----------------------------------------------------------------

library("tidyverse")
set.seed(123)

# Data --------------------------------------------------------------------

sample_one <-
    tibble(
        column_a = c(1, 2),
        column_b = runif(n = 2),
        column_other = runif(n = 2)
    )
sample_two <-
    tibble(
        column_a = c(1, 2),
        column_b = runif(n = 2),
        column_other = runif(n = 2)
    )
sample_three <-
    tibble(
        column_a = c(1, 2),
        column_b = runif(n = 2),
        column_other = runif(n = 2)
    )

# Function ----------------------------------------------------------------

left_join_on_column_a <- function(keep_var, ...) {
    keep_var <- enquo(keep_var)
    dots <- list(...)
    clean_dfs <- map(dots, select, !!keep_var, "column_a")
    reduce(.x = clean_dfs,
           .f = left_join,
           "column_a") %>%
        gather(key = "model_type", !!keep_var, -column_a)
}

# Test --------------------------------------------------------------------

left_join_on_column_a(keep_var = column_b, sample_one, sample_two, sample_three)

Problem

I would like to be able to programmatically modify the suffix argument of left_join:

suffix If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.

Current results

# A tibble: 6 x 3
  column_a model_type column_b
     <dbl> <chr>         <dbl>
1        1 column_b.x   0.288 
2        2 column_b.x   0.788 
3        1 column_b.y   0.940 
4        2 column_b.y   0.0456
5        1 column_b     0.551 
6        2 column_b     0.457 

Desired results

# A tibble: 6 x 3
  column_a model_type      column_b
     <dbl> <chr>            <dbl>
1        1 sample_one       0.288 
2        2 sample_one       0.788 
3        1 sample_two       0.940 
4        2 sample_two       0.0456
5        1 sample_three     0.551 
6        2 sample_three     0.457 

The model_type column reflects name of the object passed via ....

Attempts

I was trying to capture names of the objects passed within ... but it's not a named object so it doesn't make sense:

left_join_on_column_a <- function(keep_var, ...) {
    keep_var <- enquo(keep_var)
    dots <- list(...)
    table_names <- names(dots)
    clean_dfs <- map(dots, select, !!keep_var, "column_a")
    reduce(.x = clean_dfs,
           .f = left_join,
           "column_a", 
           table_names) %>%
        gather(key = "model_type", !!keep_var, -column_a)
}

Solution

  • Maybe rename column_b so that you don't have to worry about suffix

    left_join_on_column_a <- function(keep_var, common_var, ...) {
        nm = unname(sapply(rlang::enexprs(...), as.character))
        keep_var <- as.character(substitute(keep_var))
        common_var = as.character(substitute(common_var))
    
        foo = function(x, y) {
            x %>% select(!!common_var, !!y := !!keep_var)
        }
    
        reduce(.x = Map(foo, list(...), nm),
               .f = left_join,
               common_var) %>%
            gather("model_type", !!keep_var, -!!common_var)
    }
    
    left_join_on_column_a(column_b, column_a, sample_one, sample_two, sample_three)