Search code examples
rdplyrscopesymbolsevaluation

R: Why is a free variable within a function recognized as an unquoted column name?


My understanding is that R's scoping will always try to assign values to free variables within a function by searching the environment within which the function is defined and then searching parent environments. However, I am seeking assistance reconciling this with why I don't receive an error from a function call.

Suppose I define a function foo in the global environment and pass it arguments that are either objects (e.g., a data.frame) in the global environment or the unquoted names of elements of that object.

library(dplyr)

# Example input objects
dv <- "c"
df <- data.frame(x = rep(c(3,NA_real_), 5),
                 y = letters[1:10],
                 z = 1:10)

# Define a function
foo <- function(df, dv, response, treat) {
  df %>%
    filter(y %in% dv) %>%
    filter(!is.na(response)) %>%
    select(treat)
}

My understanding is that y is a free variable here and I should expect R will look for y in the global environment where foo was defined, find nothing, and throw an error. However, any errors/warnings are unrelated to y:

foo(df = df, dv = dv, response = x, treat = z)
#> Error in `filter()`:
#> ! Problem while computing `..1 = !is.na(response)`.
#> Caused by error in `mask$eval_all_filter()`:
#> ! object 'x' not found

While we can fix those scoping errors by quoting and unquoting (per below), it remains unclear to me how y is recognized as an unquoted column name and not producing an error.

foo_new <- function(df, dv, response, treat) {
  response <- enquo(response)
  treat <- enquo(treat)
  
  df %>%
    filter(y %in% dv) %>%
    filter(!is.na(!!response)) %>%
    select(!!treat)
}

foo_new(df, dv, x, z)
#>   z
#> 1 3

Solution

  • It might help to make things more explicit, in regards to quoted vs. unquoted expressions and the environments from where objects are coming. If I were to roll up foo into an R package, this is what I'd do (using roxygen2 comments to make the type of function arguments explicit).

    #' Test function
    #' 
    #' @param df A `data.frame`.
    #' @param dv A `character` scalar.
    #' @param response An unquoted expression corresponding to a column in `df`.
    #' @param treat An unquoted expression corresponding to a column in `df`.
    #' 
    #' @importFrom magrittr "%>%"
    #' @importFrom rlang .data 
    foo_explicit <- function(df, dv, response, treat) {
        df %>%
            filter(.data$y %in% dv) %>%
            filter(!is.na({{ response }})) %>%
            select({{ treat }})
    }
    

    A few comments:

    • .data$y inside filter makes it explicit that y is a column within df.
    • The dv argument is a character scalar within the foo_explicit environment.
    • The response and treat arguments are unquoted expressions. The curly-curly operator is just a short-cut to the enquo + !! construct that you use in foo_new.