Search code examples
rdplyrtidyevalquosure

dplyr .data pronoun vs "quosure" approach


In dplyr v0.7.0, the .data pronoun was introduced that allowed us to refer to variables with strings. I was just curious as to whether this approach was preferred over the "quosure" approach. For example, here is an approach that uses the .data pronoun:

varname <- "gear"
data_pronoun_method_df <- dplyr::mutate(mtcars, new_col = .data[[varname]] + 2)

This is compared to an example using the quosure approach:

quo_varname <- rlang::quo(gear)
quo_method_df <- dplyr::mutate(mtcars, new_col = !! quo_varname + 2)

Both methods produce the same output:

data_pronoun_method_df

# mpg cyl  disp  hp drat    wt  qsec vs am gear carb new_col
# 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4       6
# 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4       6
# 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1       6
# 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1       5
# 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2       5
# 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1       5
# 7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4       5
# 8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2       6
# [ reached getOption("max.print") -- omitted 24 rows ]

all.equal(data_pronoun_method_df, quo_method_df)
# [1] TRUE

Is there any real difference? What are the advantages and disadvantages of either method?


Solution

  • The .data pronoun can be useful to work around NSE but it is more or less orthogonal to tidy eval. Its main purpose is to make sure the variable will be looked up in the data frame. If it doesn't exist you get an error. This is in contrast to bare names that could pick up local objects if they are defined:

    other <- 1e10
    transmute(mtcars, 2 * other)            # Succeeds erroneously
    transmute(mtcars, 2 * .data[["other"]]  # Fails
    

    Using the .data pronoun is more reliable than just referring to the data frame explicitly because the data might be grouped:

    group_by(mtcars, cyl) %>%
      transmute(2L * .data[["am"]])
    

    In that example .data[["am"]] represents slices of the am column defined by the levels of cyl.

    Edit: For completeness, you can accomplish the same thing with quosures and quasiquotation. If you create a quosure to a symbol with the empty env as environment, the symbol lookup will only succeed if the data frame contains such a column:

    other <- 1e10
    quo <- new_quosure(quote(other), empty_env())
    transmute(mtcars, 2L * !!quo)  # Fails