Search code examples
rdplyrtidyverserlangtidyselect

R: Tidyverse selection semantics tidyselect::eval_select appending numbers to duplicates


I am trying for some time to understand tidyverse design and how to program with it. I was trying to write a function that uses tidyselect semantics, and I found that tidyselect::eval_select appends numbers to lhs expressions. This was not surprising seeing that this sematic is used for column renaming. Unfortunately, my function meant for building a data structure doesn't need this behavior, it needs the regular name provided in lhs of the expression (duplicated as many times as necessary). I haven't managed to find out where this behavior is even coming from; it seems to be a make.unique but I can't find where it is implemented. If you know, I am quite curious to learn, if not, solving my problem shouldn't depend on it. All I want is for the lhs names to not have appended numbers, as in the example:

library(tidyverse)

# Data
data <- mtcars[, 8:11]

# Example
data %>%
  tidyselect::eval_select(rlang::expr(c(foo = 1, bar = c(2:4), foobar = c(1, "am", "gear", "carb"))), .)
#>     foo    bar1    bar2    bar3 foobar1 foobar2 foobar3 foobar4 
#>       1       2       3       4       1       2       3       4

# Function
test <- function(.data, ...) {
  loc <- tidyselect::eval_select(rlang::expr(c(...)), .data)
  names <- names(.data)
  list(names(loc), names[loc])
}

data %>%
  test(foo = 1, bar = c(2:4), foobar = c(1, "am", "gear", "carb"))
#> [[1]]
#> [1] "foo"     "bar1"    "bar2"    "bar3"    "foobar1" "foobar2" "foobar3"
#> [8] "foobar4"
#> 
#> [[2]]
#> [1] "vs"   "am"   "gear" "carb" "vs"   "am"   "gear" "carb"

Created on 2021-05-22 by the reprex package (v2.0.0)

Desired output:

#> [[1]]
#> [1] "foo"     "bar"    "bar"    "bar"    "foobar" "foobar" "foobar"
#> [8] "foobar"
#> 
#> [[2]]
#> [1] "vs"   "am"   "gear" "carb" "vs"   "am"   "gear" "carb"

Any help is greatly appreciated.


Solution

  • The problem is caused by a function called ensure_named deeply nested inside eval_selects implementation. It is part pf the vars_select_eval function.

    ensure_named(pos, vars, uniquely_named, allow_rename)

    The good news is that we just need to overwrite the uniquely_named argument and this argument is carried on from the first implementation function called eval_select_impl which is called by eval_select itself. So all we need to do is to rewrite tidyselect::eval_select.

    To get the wanted output we need to do two things:

    1. Add uniquely_named = NULL as argument and specify it with FALSE when calling the function
    2. Specify the existing argument name_spec = "{outer}". Doing only this step will not suffice unless uniquely_named is set to FALSE.

    Before the actual code, a note of caution:

    tidyselect::eval_select does on purpose not allow duplicate column names.

    For starters, it is not possible to easily create a tibble with duplicate column names:

    tibble(a = 1:3, b = 4:6, a = 7:9)
    #> Error: Column name `a` must not be duplicated.
    #> Use .name_repair to specify repair.
    

    One workaround is to use a list with tibble::new_tibble:

    tibble::new_tibble(list(a = 1:3, b = 4:6, a = 7:9), nrow = 3)
    #> # A tibble: 3 x 3
    #>       a     b     a
    #>   <int> <int> <int>
    #> 1     1     4     7
    #> 2     2     5     8
    #> 3     3     6     9
    

    For a data.frame it is only possible to create non-unique names, when the check.names argument is set to FALSE:

    data.frame(a = 1:3, b = 4:6, a = 7:9, check.names = FALSE)
    #>   a b a
    #> 1 1 4 7
    #> 2 2 5 8
    #> 3 3 6 9
    

    But when we use this data.frame with regular {dplyr} verbs, an error will be thrown, telling us that we cannot transform data frames with duplicate names:

    data.frame(a = 1:3, b = 4:6, a = 7:9, check.names = FALSE) %>% 
      mutate(c = 1:3)
    #> Error: Can't transform a data frame with duplicate names.
    

    So from this we can assume that it is not recommended to use data.frames with duplicate names in the {tidyverse}. It probably contradicts the notion of tidy data.

    This being said, below is the above mentioned approach to solve this problem:

    library(tidyverse)
    
    # Data
    data <- mtcars[, 8:11]
    
    # custom eval_select function
    my_eval_select <- function(expr, data,
                               env = rlang::caller_env(),
                               ..., include = NULL, 
                               exclude = NULL, strict = TRUE,
                               name_spec = NULL,
                               uniquely_named = NULL, # this is the new argument
                               allow_rename = TRUE) {
      ellipsis::check_dots_empty()
      tidyselect:::eval_select_impl(data, names(data), rlang::as_quosure(expr, env), 
                       include = include, exclude = exclude, strict = strict, 
                       name_spec = name_spec, allow_rename = allow_rename,
                       uniquely_named = uniquely_named) # which we also add here
    }
    
    # example 1
    data %>%
      my_eval_select(rlang::expr(c(foo = 1, bar = c(2:4), foobar = c(1, "am", "gear", "carb"))),
                              data = .,
                              name_spec = "{outer}",  # we need to specify this
                              uniquely_named = FALSE) # and this
    #>    foo    bar    bar    bar foobar foobar foobar foobar 
    #>      1      2      3      4      1      2      3      4
    
    # example: custom function
    test <- function(.data, ...) {
      loc <- my_eval_select(rlang::expr(c(...)),
                            data = .data,
                            name_spec = "{outer}",
                            uniquely_named = FALSE)
      names <- names(.data)
      list(names(loc), names[loc])
    }
    
    # test
    data %>%
      test(foo = 1, bar = c(2:4), foobar = c(1, "am", "gear", "carb"))
    #> [[1]]
    #> [1] "foo"    "bar"    "bar"    "bar"    "foobar" "foobar" "foobar" "foobar"
    #> 
    #> [[2]]
    #> [1] "vs"   "am"   "gear" "carb" "vs"   "am"   "gear" "carb"
    

    Created on 2021-05-22 by the reprex package (v0.3.0)