Search code examples
rdplyrtidyverserlangtidyeval

R - dplyr how do I reference a column created in a function with the walrus operator :=


I am trying to create a function and I want to reference a column I previously created inside the function with the {{}} and :=. How can I reference the "{{col}}_d" column?

library(tidyverse)

data <- tibble(
    a = seq(1,10),
    b = sample(c("a", "b", "c"), 10, replace = T),
    c = rnorm(10, 100, 10)
    )

data_func <- function(df, col) {
    df %>% 
        group_by({{col}}) %>% 
        mutate(
            "{{col}}_d" := a * c,
            "{{col}}_e" := "{{col}}_d" * 10
        )
}

data %>% 
    data_func(b)
#> Error in `mutate()`:
#> ℹ In argument: `b_e = "{{col}}_d" * 10`.
#> ℹ In group 1: `b = "a"`.
#> Caused by error in `"{{col}}_d" * 10`:
#> ! non-numeric argument to binary operator

#> Backtrace:
#>      ▆
#>   1. ├─data %>% data_func(b)
#>   2. ├─global data_func(., b)
#>   3. │ └─... %>% ...
#>   4. ├─dplyr::mutate(...)
#>   5. ├─dplyr:::mutate.data.frame(...)
#>   6. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
#>   7. │   ├─base::withCallingHandlers(...)
#>   8. │   └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
#>   9. │     └─mask$eval_all_mutate(quo)
#>  10. │       └─dplyr (local) eval()
#>  11. └─base::.handleSimpleError(...)
#>  12.   └─dplyr (local) h(simpleError(msg, call))
#>  13.     └─rlang::abort(message, class = error_class, parent = parent, call = error_call)

Created on 2023-02-12 with reprex v2.0.2

I was expecting the previously created column to be used in the nexted new column added.


Solution

  • Below is one way to reference the dynamic column:

    library(tidyverse)
    library(rlang)
    
    data <- tibble(
      a = seq(1,10),
      b = sample(c("a", "b", "c"), 10, replace = T),
      c = rnorm(10, 100, 10)
    )
    
    data_func <- function(df, col) {
      col_nm <- rlang::englue("{{ col }}_d") 
      
      df %>%
        group_by({{ col }}) %>%
        mutate("{{ col }}_d" := a * c,
               "{{ col }}_e" := !! sym(col_nm) * 10
               )
        
    }
    
    data %>% 
      data_func(b)
    #> # A tibble: 10 × 5
    #> # Groups:   b [3]
    #>        a b         c    b_d    b_e
    #>    <int> <chr> <dbl>  <dbl>  <dbl>
    #>  1     1 b      75.6   75.6   756.
    #>  2     2 a     104.   208.   2082.
    #>  3     3 b     113.   340.   3398.
    #>  4     4 a      92.3  369.   3690.
    #>  5     5 a      92.7  464.   4637.
    #>  6     6 c      99.1  594.   5944.
    #>  7     7 a      96.1  673.   6725.
    #>  8     8 a     107.   854.   8536.
    #>  9     9 b      95.4  859.   8589.
    #> 10    10 a     106.  1061.  10608.
    

    Another way is to use .data

    data_func <- function(df, col) {
      col_nm <- rlang::englue("{{ col }}_d") 
      
      df %>%
        group_by({{ col }}) %>%
        mutate("{{ col }}_d" := a * c,
               "{{ col }}_e" := .data[[col_nm]] * 10
               )
        
    }
    

    Created on 2023-02-13 with reprex v2.0.2