Search code examples
rdplyrnon-standard-evaluation

Magic in the way R evaluates function arguments


Consider the following R code:

y1 <- dataset %>% dplyr::filter(W == 1) 

This works, but there seems to some magic here. Usually, when we have an expression like foo(bar), we should be able to do this:

baz <= bar
foo(baz)

However, in the presented code snippet, we cannot evaluate W == 1 outside of dplyr::filter()! W is not a defined variable.

What's going on?


Solution

  • dplyr uses a concept called Non-standard Evaluation (NSE) to make columns from the data frame argument accessible to its functions without quoting or using dataframe$column syntax. Basically:

    [Non-standard evaluation] is a catch-all term that means they don’t follow the usual R rules of evaluation. Instead, they capture the expression that you typed and evaluate it in a custom way.1

    In this case, the custom evaluation takes the argument(s) given to dplyr::filter, and parses them so that W can be used to refer to the dataset$W. The reason that you can't then take that variable and use it elsewhere is that NSE is only applied to the scope of the function.


    NSE makes a trade-off: functions which modify scope are less safe and/or unusable in programming where you're building a program that uses functions to modify other functions:

    This is an example of the general tension between functions that are designed for interactive use and functions that are safe to program with. A function that uses substitute() might reduce typing, but it can be difficult to call from another function.2

    For example, if you wanted to write a function which would use the same code, but swap out W == 1 for W == 0 (or some completely different filter), NSE would make that more difficult to accomplish.

    In 2017 the tidyverse started to build a solution to this in tidy evaluation.