Search code examples
rdplyr

difference between .data and cur_data()


m <- 10
mtcars %>% dplyr::mutate(disp = .data$disp * .env$m)

is equivalent to

m <- 10
mtcars %>% dplyr::mutate(disp = cur_data()$disp * .env$m)

Can you give an example where cur_data() and .data will yield different results?

I am told that cur_data() and .data are not interchangeable in all contexts.


Solution

  • Within group_by .data still includes all columns but cur_data() excludes the group_by column(s). For example, below cur_data()[["cyl"]] is NULL because cyl is a group by column so x does not appear in the result whereas y does appear.

    library(dplyr)
    
    mtcars %>%
      group_by(cyl) %>%
      mutate(x = cur_data()[["cyl"]], y = .data[["cyl"]]) %>%
      ungroup %>%
      names
    ##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
    ## [11] "carb" "y"   
    

    Update

    Since this question was answered

    • dplyr has deprecated cur_data in favor of pick; however, pick(cyl) returns an error rather than NULL for group variables although we could write pick(everything())[["cyl"]] to get NULL rather than an error
    • mutate now has an optional .by= argument so the above would now be written

    so to get the above we would now write

    mtcars %>%
      mutate(x = pick(everything())[["cyl"]], y = .data[["cyl"]], .by = cyl) %>%
      names