I've written most of my question as comments in my provided reprex. I'm looking to improve the semantics of my code and answer a specific question regarding quoted variables as parameters to closure-like functions.
library(tidyverse)
# A df of file-paths split so all basenames
# are in the same column, but parent-dirs
# are spread across an abritary number of columns
# and filled with NA's.
dat <- tibble(
ref01 = rep("analysis", 5),
ref02 = c(NA, NA, "next", "next", "next"),
ref03 = c(NA, NA, NA, NA, "last"),
target = c("analysis.test1", "analysis.test2",
"next.test3", "next.test4",
"last.test5")
)
# For example this reprex df shows file-paths
# from a file-tree that looks like:
# analysis
# ├── next
# │ ├── last
# │ │ └── last.test5
# │ ├── next.test3
# │ └── next.test4
# ├── analysis.test1
# └── analysis.test2
dat
#> # A tibble: 5 x 4
#> ref01 ref02 ref03 target
#> <chr> <chr> <chr> <chr>
#> 1 analysis <NA> <NA> analysis.test1
#> 2 analysis <NA> <NA> analysis.test2
#> 3 analysis next <NA> next.test3
#> 4 analysis next <NA> next.test4
#> 5 analysis next last last.test5
This function cleans up the 'target' test basenames. All test-names are preceded by its' parent-dir name and a period. (e.g. 'last.test5')
This function takes a "target" column and an arbitrary number of parent-dir columns. It reverses the list of parent-dirs and finds the first non-NA value. It then matches that value to the target value and removes it.
My question lies within this function:
Currently, the replace_pattern()
function relies on
the fact that the .key
column is titled "target" and
is hardcoded as an input parameter.
This is because of the way `pmap' works by taking p-num arguments from a list and matching arguments to names.
Since I want this function to work for arbitrarily deep
file-paths, I need to find a way to handle varying .key
names.
Is there a way to quote .key
variable so that it will be
the name of the first parameter of the replace_pattern()
function?
trim_target <- function(.tbl, .key, ...){
key <- tidyselect::eval_select(expr(c(!!enquo(.key))), .tbl)
loc <- tidyselect::eval_select(expr(c(...)), .tbl)
# First param has to be "target" since that's the name
# of the .key column.
replace_pattern <- function(target, ...){
args <- c(...)
pattern <- args %>%
rev() %>%
discard(is.na) %>%
first() %>%
paste0("\\.")
unlist(str_remove(target, pattern))
}
pmap(.tbl[,c(key, loc)], replace_pattern) %>%
unlist()
}
Expected Output:
This works as expected but is not scalable.
Also in reference to question 01, I have to pass dat
into the mutate()
function-call; which I don't see typically done.
dat %>%
mutate(target = trim_target(dat, target, ref01:ref03))
#> # A tibble: 5 x 4
#> ref01 ref02 ref03 target
#> <chr> <chr> <chr> <chr>
#> 1 analysis <NA> <NA> test1
#> 2 analysis <NA> <NA> test2
#> 3 analysis next <NA> test3
#> 4 analysis next <NA> test4
#> 5 analysis next last test5
Created on 2020-04-08 by the reprex package (v0.3.0)
Answering Question 1
When you say that you typically don't see dat
passed to mutate()
, that's because most functions usually do not require a data frame context. For example, when you see
mtcars %>% mutate( cyl = sqrt(cyl) )
the function sqrt()
works directly with the values passed to it without any care about where those values came from. In your case, you need the data frame context to help resolve the ref01:ref03
expression. For this reason, the more appropriate pattern would be to put the mutate()
operation inside your function and have it return the resulting data frame instead.
Answering Question 2
pmap()
only matches argument names if the input is a named list. If the list is unnamed, the matching is done by position. So, you can either 1) unname the argument list:
.tbl[,c(key, loc)] %>% as.list() %>% unname %>% pmap_chr(replace_pattern)
or 2) since you are already subsetting your columns with [
, turn that into a proper select
pattern and rename the selected column accordingly:
.tbl %>% select( target={{.key}}, ... ) %>% pmap_chr( replace_pattern )
Putting it all together
With the two points in mind, this is how I would rewrite your function:
mutate_target <- function(.tbl, .key, ...){
# No change from the OP
replace_pattern <- function(target, ...){
args <- c(...)
pattern <- args %>%
rev() %>%
discard(is.na) %>%
first() %>%
paste0("\\.")
unlist(str_remove(target, pattern))
}
result <- .tbl %>% select( target={{.key}}, ... ) %>% pmap_chr( replace_pattern )
.tbl %>% mutate( {{.key}} := result )
}
Note that I took out your explicit eval_select()
calls. You can pass the ...
dots to dplyr verbs directly, while using curly-curly ( {{
, which is shorthand for !!enquo
) for singular columns. Here's how you would use the new function:
dat %>% mutate_target( target, ref01:ref03 ) # Works
dat %>% rename( abc = target ) %>% mutate_target( abc, ref01:ref03 ) # Also works