Search code examples
rdplyrtidyversetidyselect

Why the tidyselect helper function "where" can be detected inside the dplyr helper function "across"?


The "tidyselect" package offers a select helper function where. where is used to select dataframe columns with a custom function. It is an internal function from "tidyselect". That means where will not be loaded to your namespace and you can only call it by tidyselect:::where.

However, I saw the following example from the dplyr vignettes: columnwise operations.

starwars %>% 
  summarise(across(where(is.character), ~ length(unique(.x))))
#> # A tibble: 1 x 8
#>    name hair_color skin_color eye_color   sex gender homeworld species
#>   <int>      <int>      <int>     <int> <int>  <int>     <int>   <int>
#> 1    87         13         31        15     5      3        49      38

In this example, where is written without a prefixal "tidyselect:::" but clearly, there is no errors in the code and it produces meaningful result. This seems odd to me. I would like to know why the code functions normally.

I guess it is due to the "code quotation", which is a part of the tidyeval methodology. Roughly speaking, code quotation suspends the codes as expressions, and evaluates the expressions later in an "inner environment". This is only an intuitive guess and I don't know how to test it.

I hope someone can help me with the "where" problem, or leave some references about how the code functions for me.


Solution

  • You did not say what packages are attached in your example, but let's assume that the only attached package is dplyr.

    library(dplyr)
    

    First, we notice that the function where is not attached, i.e. not known to the current R session. We can check by just typing its name (without parentheses) in the console. If the function was attached, we would now see its source code. Instead we get an error that object where was not found.

    However, we note that dplyr attaches other functions from tidyselect, with starts_with being an example. If we repeat the experiment of typing the name to the console, we now see the source code and also that the functions originates in the tidyselect namespace:

    > starts_with
    function (match, ignore.case = TRUE, vars = NULL) 
    {
        check_match(match)
        vars <- vars %||% peek_vars(fn = "starts_with")
        if (ignore.case) {
            vars <- tolower(vars)
            match <- tolower(match)
        }
        flat_map_int(match, starts_with_impl, vars)
    }
    <bytecode: 0x0000027338e5f8e8>
    <environment: namespace:tidyselect>
    

    In this case the function starts_with was attached by dplyr using the NAMESPACE file where you can list functions from other packages that should be attached along with your package. You can check in the dplyr source code.

    But where is not made available this way as we have already seen. In this case the function is indeed quoted and only evaluated within the the tidyselect package. If you look at the source code for across, you'll notice that in line 82, the column specification is passes to function across_setup defined in the same file. In this function, the column specification is quoted (lines 174, 175) and then send to the tidyselect function tidyselect::eval_select (line 177). This function is then part of the tidyselect package and has access to where.