I am trying for some time to understand tidyverse design and how to program with it. I was trying to write a function that uses tidyselect semantics, and I found that tidyselect::eval_select
appends numbers to lhs expressions. This was not surprising seeing that this sematic is used for column renaming. Unfortunately, my function meant for building a data structure doesn't need this behavior, it needs the regular name provided in lhs of the expression (duplicated as many times as necessary). I haven't managed to find out where this behavior is even coming from; it seems to be a make.unique
but I can't find where it is implemented. If you know, I am quite curious to learn, if not, solving my problem shouldn't depend on it.
All I want is for the lhs names to not have appended numbers, as in the example:
library(tidyverse)
# Data
data <- mtcars[, 8:11]
# Example
data %>%
tidyselect::eval_select(rlang::expr(c(foo = 1, bar = c(2:4), foobar = c(1, "am", "gear", "carb"))), .)
#> foo bar1 bar2 bar3 foobar1 foobar2 foobar3 foobar4
#> 1 2 3 4 1 2 3 4
# Function
test <- function(.data, ...) {
loc <- tidyselect::eval_select(rlang::expr(c(...)), .data)
names <- names(.data)
list(names(loc), names[loc])
}
data %>%
test(foo = 1, bar = c(2:4), foobar = c(1, "am", "gear", "carb"))
#> [[1]]
#> [1] "foo" "bar1" "bar2" "bar3" "foobar1" "foobar2" "foobar3"
#> [8] "foobar4"
#>
#> [[2]]
#> [1] "vs" "am" "gear" "carb" "vs" "am" "gear" "carb"
Created on 2021-05-22 by the reprex package (v2.0.0)
Desired output:
#> [[1]]
#> [1] "foo" "bar" "bar" "bar" "foobar" "foobar" "foobar"
#> [8] "foobar"
#>
#> [[2]]
#> [1] "vs" "am" "gear" "carb" "vs" "am" "gear" "carb"
Any help is greatly appreciated.
The problem is caused by a function called ensure_named
deeply nested inside eval_select
s implementation. It is part pf the vars_select_eval
function.
ensure_named(pos, vars, uniquely_named, allow_rename)
The good news is that we just need to overwrite the uniquely_named
argument and this argument is carried on from the first implementation function called eval_select_impl
which is called by eval_select
itself. So all we need to do is to rewrite tidyselect::eval_select
.
To get the wanted output we need to do two things:
uniquely_named = NULL
as argument and specify it with FALSE
when calling the functionname_spec = "{outer}"
. Doing only this step will not suffice unless uniquely_named
is set to FALSE
.Before the actual code, a note of caution:
tidyselect::eval_select
does on purpose not allow duplicate column names.
For starters, it is not possible to easily create a tibble
with duplicate column names:
tibble(a = 1:3, b = 4:6, a = 7:9)
#> Error: Column name `a` must not be duplicated.
#> Use .name_repair to specify repair.
One workaround is to use a list with tibble::new_tibble
:
tibble::new_tibble(list(a = 1:3, b = 4:6, a = 7:9), nrow = 3)
#> # A tibble: 3 x 3
#> a b a
#> <int> <int> <int>
#> 1 1 4 7
#> 2 2 5 8
#> 3 3 6 9
For a data.frame
it is only possible to create non-unique names, when the check.names
argument is set to FALSE
:
data.frame(a = 1:3, b = 4:6, a = 7:9, check.names = FALSE)
#> a b a
#> 1 1 4 7
#> 2 2 5 8
#> 3 3 6 9
But when we use this data.frame
with regular {dplyr} verbs, an error will be thrown, telling us that we cannot transform data frames with duplicate names:
data.frame(a = 1:3, b = 4:6, a = 7:9, check.names = FALSE) %>%
mutate(c = 1:3)
#> Error: Can't transform a data frame with duplicate names.
So from this we can assume that it is not recommended to use data.frame
s with duplicate names in the {tidyverse}. It probably contradicts the notion of tidy data.
This being said, below is the above mentioned approach to solve this problem:
library(tidyverse)
# Data
data <- mtcars[, 8:11]
# custom eval_select function
my_eval_select <- function(expr, data,
env = rlang::caller_env(),
..., include = NULL,
exclude = NULL, strict = TRUE,
name_spec = NULL,
uniquely_named = NULL, # this is the new argument
allow_rename = TRUE) {
ellipsis::check_dots_empty()
tidyselect:::eval_select_impl(data, names(data), rlang::as_quosure(expr, env),
include = include, exclude = exclude, strict = strict,
name_spec = name_spec, allow_rename = allow_rename,
uniquely_named = uniquely_named) # which we also add here
}
# example 1
data %>%
my_eval_select(rlang::expr(c(foo = 1, bar = c(2:4), foobar = c(1, "am", "gear", "carb"))),
data = .,
name_spec = "{outer}", # we need to specify this
uniquely_named = FALSE) # and this
#> foo bar bar bar foobar foobar foobar foobar
#> 1 2 3 4 1 2 3 4
# example: custom function
test <- function(.data, ...) {
loc <- my_eval_select(rlang::expr(c(...)),
data = .data,
name_spec = "{outer}",
uniquely_named = FALSE)
names <- names(.data)
list(names(loc), names[loc])
}
# test
data %>%
test(foo = 1, bar = c(2:4), foobar = c(1, "am", "gear", "carb"))
#> [[1]]
#> [1] "foo" "bar" "bar" "bar" "foobar" "foobar" "foobar" "foobar"
#>
#> [[2]]
#> [1] "vs" "am" "gear" "carb" "vs" "am" "gear" "carb"
Created on 2021-05-22 by the reprex package (v0.3.0)