Search code examples
rdplyrpurrrrlangtidyeval

Use quoted parameter as variable name for closure instantiation?


I've written most of my question as comments in my provided reprex. I'm looking to improve the semantics of my code and answer a specific question regarding quoted variables as parameters to closure-like functions.

library(tidyverse)

# A df of file-paths split so all basenames
# are in the same column, but parent-dirs
# are spread across an abritary number of columns
# and filled with NA's.
dat <- tibble(
  ref01 = rep("analysis", 5),
  ref02 = c(NA, NA, "next", "next", "next"),
  ref03 = c(NA, NA, NA, NA, "last"),
  target = c("analysis.test1", "analysis.test2",
             "next.test3", "next.test4",
             "last.test5")
)

# For example this reprex df shows file-paths
# from a file-tree that looks like:
# analysis
# ├── next
# │   ├── last
# │   │   └── last.test5
# │   ├── next.test3
# │   └── next.test4
# ├── analysis.test1
# └── analysis.test2
dat
#> # A tibble: 5 x 4
#>   ref01    ref02 ref03 target        
#>   <chr>    <chr> <chr> <chr>         
#> 1 analysis <NA>  <NA>  analysis.test1
#> 2 analysis <NA>  <NA>  analysis.test2
#> 3 analysis next  <NA>  next.test3    
#> 4 analysis next  <NA>  next.test4    
#> 5 analysis next  last  last.test5

This function cleans up the 'target' test basenames. All test-names are preceded by its' parent-dir name and a period. (e.g. 'last.test5')

This function takes a "target" column and an arbitrary number of parent-dir columns. It reverses the list of parent-dirs and finds the first non-NA value. It then matches that value to the target value and removes it.

My question lies within this function:

  1. Is there a more semantic way of building this function so that it can be expressed inside of a `mutate()' function?
  2. Currently, the replace_pattern() function relies on the fact that the .key column is titled "target" and is hardcoded as an input parameter.

    This is because of the way `pmap' works by taking p-num arguments from a list and matching arguments to names.

    Since I want this function to work for arbitrarily deep file-paths, I need to find a way to handle varying .key names.

    Is there a way to quote .key variable so that it will be the name of the first parameter of the replace_pattern() function?

trim_target <- function(.tbl, .key, ...){
  key <- tidyselect::eval_select(expr(c(!!enquo(.key))), .tbl)
  loc <- tidyselect::eval_select(expr(c(...)), .tbl)

  # First param has to be "target" since that's the name
  # of the .key column.
  replace_pattern <- function(target, ...){
    args <- c(...)
    pattern <- args %>% 
      rev() %>% 
      discard(is.na) %>% 
      first() %>% 
      paste0("\\.")

    unlist(str_remove(target, pattern))
  }

  pmap(.tbl[,c(key, loc)], replace_pattern) %>% 
    unlist()
}

Expected Output: This works as expected but is not scalable. Also in reference to question 01, I have to pass dat into the mutate() function-call; which I don't see typically done.

dat %>% 
  mutate(target = trim_target(dat, target, ref01:ref03))
#> # A tibble: 5 x 4
#>   ref01    ref02 ref03 target
#>   <chr>    <chr> <chr> <chr> 
#> 1 analysis <NA>  <NA>  test1 
#> 2 analysis <NA>  <NA>  test2 
#> 3 analysis next  <NA>  test3 
#> 4 analysis next  <NA>  test4 
#> 5 analysis next  last  test5

Created on 2020-04-08 by the reprex package (v0.3.0)


Solution

  • Answering Question 1

    When you say that you typically don't see dat passed to mutate(), that's because most functions usually do not require a data frame context. For example, when you see

    mtcars %>% mutate( cyl = sqrt(cyl) )
    

    the function sqrt() works directly with the values passed to it without any care about where those values came from. In your case, you need the data frame context to help resolve the ref01:ref03 expression. For this reason, the more appropriate pattern would be to put the mutate() operation inside your function and have it return the resulting data frame instead.

    Answering Question 2

    pmap() only matches argument names if the input is a named list. If the list is unnamed, the matching is done by position. So, you can either 1) unname the argument list:

    .tbl[,c(key, loc)] %>% as.list() %>% unname %>% pmap_chr(replace_pattern)
    

    or 2) since you are already subsetting your columns with [, turn that into a proper select pattern and rename the selected column accordingly:

    .tbl %>% select( target={{.key}}, ... ) %>% pmap_chr( replace_pattern )
    

    Putting it all together

    With the two points in mind, this is how I would rewrite your function:

    mutate_target <- function(.tbl, .key, ...){
    
      # No change from the OP
      replace_pattern <- function(target, ...){
        args <- c(...)
        pattern <- args %>%
          rev() %>%
          discard(is.na) %>%
          first() %>%
          paste0("\\.")
    
        unlist(str_remove(target, pattern))
      }
    
      result <- .tbl %>% select( target={{.key}}, ... ) %>% pmap_chr( replace_pattern )
      .tbl %>% mutate( {{.key}} := result )
    }
    

    Note that I took out your explicit eval_select() calls. You can pass the ... dots to dplyr verbs directly, while using curly-curly ( {{, which is shorthand for !!enquo ) for singular columns. Here's how you would use the new function:

    dat %>% mutate_target( target, ref01:ref03 )                           # Works
    dat %>% rename( abc = target ) %>% mutate_target( abc, ref01:ref03 )   # Also works