The "tidyselect" package offers a select helper function where
. where
is used to select dataframe columns with a custom function. It is an internal function from "tidyselect". That means where
will not be loaded to your namespace and you can only call it by tidyselect:::where
.
However, I saw the following example from the dplyr vignettes: columnwise operations.
starwars %>%
summarise(across(where(is.character), ~ length(unique(.x))))
#> # A tibble: 1 x 8
#> name hair_color skin_color eye_color sex gender homeworld species
#> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 87 13 31 15 5 3 49 38
In this example, where
is written without a prefixal "tidyselect:::" but clearly, there is no errors in the code and it produces meaningful result. This seems odd to me. I would like to know why the code functions normally.
I guess it is due to the "code quotation", which is a part of the tidyeval methodology. Roughly speaking, code quotation suspends the codes as expressions, and evaluates the expressions later in an "inner environment". This is only an intuitive guess and I don't know how to test it.
I hope someone can help me with the "where" problem, or leave some references about how the code functions for me.
You did not say what packages are attached in your example, but let's assume that the only attached package is dplyr
.
library(dplyr)
First, we notice that the function where
is not attached, i.e. not known to the current R session. We can check by just typing its name (without parentheses) in the console. If the function was attached, we would now see its source code. Instead we get an error that object where
was not found.
However, we note that dplyr
attaches other functions from tidyselect
, with starts_with
being an example. If we repeat the experiment of typing the name to the console, we now see the source code and also that the functions originates in the tidyselect
namespace:
> starts_with
function (match, ignore.case = TRUE, vars = NULL)
{
check_match(match)
vars <- vars %||% peek_vars(fn = "starts_with")
if (ignore.case) {
vars <- tolower(vars)
match <- tolower(match)
}
flat_map_int(match, starts_with_impl, vars)
}
<bytecode: 0x0000027338e5f8e8>
<environment: namespace:tidyselect>
In this case the function starts_with
was attached by dplyr using the NAMESPACE
file where you can list functions from other packages that should be attached along with your package. You can check in the dplyr
source code.
But where
is not made available this way as we have already seen. In this case the function is indeed quoted and only evaluated within the the tidyselect package. If you look at the source code for across, you'll notice that in line 82, the column specification is passes to function across_setup
defined in the same file. In this function, the column specification is quoted (lines 174, 175) and then send to the tidyselect
function tidyselect::eval_select
(line 177). This function is then part of the tidyselect package and has access to where
.