Search code examples
rdplyrtidyevalnse

How to define the RHS of case_when() with existing and newly defined columns passed as arguments in user defined function?


Sample code:

library(tidyverse)

iris <- iris

test_tidyeval <- function(data, col_name, col_name_2, column1) {
  
  mutate(
    data,
    {{col_name}} := case_when(Species == "setosa" ~ column1 + Sepal.Width + Petal.Length,
                TRUE ~ column1),
    {{col_name_2}} := case_when(Species == "setosa" ~ {{col_name}} + 100,
                TRUE ~ {{col_name}} + 500))
  
}

iris %>% test_tidyeval("new_column_test", "new_column_test_2", Sepal.Length)

I'm sure this is a tidyeval/nse issue which I can never get my head around.

What I basically want is for new_column_test to be created where if the row Species == "setosa" then for this to be the sum of Sepal.Length, which we're passing to column1 in the user-defined function, Sepal.Width and Petal.length, else just return the value from Sepal.Length, then for new_column_test_2 to add 100 to new_column_test with the same logical condition used previously and 500 to non setosa species.

I can seem to manipulate the LHS of case_when okay but I'm stuck on the RHS statements.


Solution

  • You need to be careful when mixing strings and symbols. They behave differently. You use {{ }} when working with symbols and .data[[]] when working with strings. This should work

    test_tidyeval <- function(data, col_name, col_name_2, column1) {
      
      mutate(
        data,
        "{col_name}" := case_when(Species == "setosa" ~ {{column1}} + Sepal.Width + Petal.Length,
                                TRUE ~ {{column1}}),
        "{col_name_2}":= case_when(Species == "setosa" ~ .data[[col_name]] + 100,
                                    TRUE ~ .data[[col_name]] + 500))
      
    }
    iris %>% test_tidyeval("new_column_test", "new_column_test_2", Sepal.Length) %>% str()
    #'data.frame':  150 obs. of  7 variables:
    # $ Sepal.Length     : num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
    # $ Sepal.Width      : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
    # $ Petal.Length     : num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
    # $ Petal.Width      : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
    # $ Species          : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
    # $ new_column_test  : num  10 9.3 9.2 9.2 10 11 9.4 9.9 8.7 9.5 ...
    # $ new_column_test_2: num  110 109 109 109 110 ...
    

    If you passed everything as symbols, it would look like this

    test_tidyeval <- function(data, col_name, col_name_2, column1) {
      
      mutate(
        data,
        "{{col_name}}" := case_when(Species == "setosa" ~ {{column1}} + Sepal.Width + Petal.Length,
                                TRUE ~ {{column1}}),
        "{{col_name_2}}":= case_when(Species == "setosa" ~ {{col_name}} + 100,
                                    TRUE ~ {{col_name}} + 500))
      
    }
    
    iris %>% test_tidyeval(new_column_test, new_column_test_2, Sepal.Length)