I would like to create a function that takes a data frame and a character vector containing column names as input, and uses tidy verse quoting functions inside in a safe manner.
I believe I have a working example of what I want to do. I would like to know if there is a more elegant solution or I am thinking about this problem incorrectly (perhaps I shouldn't want to do this?). From what I can tell, in order to avoid variable scoping issues I need to wrap the column names in .data[[]] and make it an expression before unquoting for tidy verse NSE verbs.
Regarding previous questions this answer is along the right lines but I want to abstract the code into a function. A github issue asks about this but using rlang::syms won't work as far as I can tell because the combination of the column labels with .data makes it an expression not a symbol. Here and here solve the problem but as far as I can tell don't account for a subtle bug in which the variables can leak in from the environment if they don't exist as column labels in the dataframe or the solutions don't work for the input being a vector of labels.
# Setup
suppressWarnings(suppressMessages(library("dplyr")))
suppressWarnings(suppressMessages(library("rlang")))
# define iris with and without Sepal.Width column
iris <- tibble::as_tibble(iris)
df_with_missing <- iris %>% select(-Sepal.Width)
# This should not be findable by my function
Sepal.Width <- iris$Sepal.Width * -1
################
# Now lets try a function for which we programmatically define the column labels
programmatic_mutate_y <- function(df, col_names, safe = FALSE) {
# Add .data[[]] to the col_names to make evalutation safer
col_exprs <- rlang::parse_exprs(
purrr::map_chr(
col_names,
~ glue::glue(stringr::str_c('.data[["{.x}"]]'))
)
)
output <- dplyr::mutate(df, product = purrr::pmap_dbl(list(!!!col_exprs), ~ prod(...)))
output
}
################
# The desired output
testthat::expect_error(programmatic_mutate_y(df_with_missing, c("Sepal.Width", "Sepal.Length")))
programmatic_mutate_y(iris, c("Sepal.Width", "Sepal.Length"))
#> # A tibble: 150 x 6
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species product
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 17.8
#> 2 4.9 3 1.4 0.2 setosa 14.7
#> 3 4.7 3.2 1.3 0.2 setosa 15.0
#> 4 4.6 3.1 1.5 0.2 setosa 14.3
#> 5 5 3.6 1.4 0.2 setosa 18
#> 6 5.4 3.9 1.7 0.4 setosa 21.1
#> 7 4.6 3.4 1.4 0.3 setosa 15.6
#> 8 5 3.4 1.5 0.2 setosa 17
#> 9 4.4 2.9 1.4 0.2 setosa 12.8
#> 10 4.9 3.1 1.5 0.1 setosa 15.2
#> # … with 140 more rows
Created on 2019-08-09 by the reprex package (v0.3.0)
I think you are making things complicated. With the _at
variant, you can use strings as arguments in almost every dplyr
functions. purrr::pmap_dbl()
is used to map calculation by rows.
programmatic_mutate_y_v1 <- function(df, col_names, safe = FALSE) {
df["product"] <- purrr::pmap_dbl(dplyr::select_at(df,col_names),prod)
return(df)
}
programmatic_mutate_y_v1(iris, c("Sepal.Width", "Sepal.Length"))
# A tibble: 150 x 6
Sepal.Length Sepal.Width Petal.Length Petal.Width Species product
<dbl> <dbl> <dbl> <dbl> <fct> <dbl>
1 5.1 3.5 1.4 0.2 setosa 17.8
2 4.9 3 1.4 0.2 setosa 14.7
3 4.7 3.2 1.3 0.2 setosa 15.0
4 4.6 3.1 1.5 0.2 setosa 14.3
5 5 3.6 1.4 0.2 setosa 18
6 5.4 3.9 1.7 0.4 setosa 21.1
7 4.6 3.4 1.4 0.3 setosa 15.6
8 5 3.4 1.5 0.2 setosa 17
9 4.4 2.9 1.4 0.2 setosa 12.8
10 4.9 3.1 1.5 0.1 setosa 15.2
# ... with 140 more rows