Search code examples
rtidyselectr-recipes

Write your own tidyselect functions


I wrote an R package that utilizes the {tidyselect} selectors (e.g. contains(), starts_with(), etc.). I would like to add a few more select helper functions to the package to select variables based on some attribute. For example, select all numeric variables or perhaps all logical variables.

I have reviewed the {tidyselect} base code. But I can't surmise how the registration of the variables is working, and therefore can't extend it to select variables by their attributes.

I have done some searching, and it looks like the {recipes} package has successfully implemented additional helpers like I am looking for (e.g. all_numeric()), but I am struggling to write extension functions myself. https://github.com/tidymodels/recipes/blob/master/R/selections.R

What it comes down to, I believe, is that I do not understand what is happening when the variables are registered with the tidyselect::scoped_vars() function. If I run tidyselect::scoped_vars(vars = names(mtcars)) in a clean environment, I don't see any changed being made. But I am able to use the {tidyselect} helpers in the global environment after registering the variables.

names(mtcars)
#>  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
#> [11] "carb"
tidyselect::scoped_vars(vars = names(mtcars))

# returns position of column 'mpg'
tidyselect::starts_with("mp")
#> 1

Any tips or direction to some documentation would be GREATLY appreciated! Thank you!


Solution

  • When you call scoped_vars(), the given variable names are saved in an internal environment for the duration of the current function call:

    (function() {
      print(tidyselect:::vars_env$selected)
      tidyselect::scoped_vars(names(mtcars))
      print(tidyselect:::vars_env$selected)
    })()
    #> NULL
    #>  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
    #> [11] "carb"
    
    print(tidyselect:::vars_env$selected)
    #> NULL
    

    As far as I can tell, this is the only information that {tidyselect} keeps about the variables; so if you want to select based on attributes, you have to maintain the attribute information yourself. This is also what {recipes} does, with a cur_info_env environment.

    A crude implementation could look something like this:

    type_env <- rlang::new_environment()
    
    select_with_attributes <- function(.data, ...) {
      type_env$types <- purrr::map(.data, class)
      dplyr::select(.data, ...)
    }
    
    all_numeric <- function() {
      which(purrr::map_lgl(type_env$types, ~ any(.x %in% "numeric")))
    }
    
    head(select_with_attributes(iris, all_numeric()))
    #>   Sepal.Length Sepal.Width Petal.Length Petal.Width
    #> 1          5.1         3.5          1.4         0.2
    #> 2          4.9         3.0          1.4         0.2
    #> 3          4.7         3.2          1.3         0.2
    #> 4          4.6         3.1          1.5         0.2
    #> 5          5.0         3.6          1.4         0.2
    #> 6          5.4         3.9          1.7         0.4
    

    Created on 2019-06-13 by the reprex package (v0.2.1)