Search code examples
rdplyrrlangtidyevalnse

Use of tidyeval based non-standard evaluation in recode in right-hand side of mutate


Consider a tibble where each column is a character vector which can take many values -- let's say "A" through "F".

library(tidyverse)
sample_df <- tibble(q1 = c("A", "B", "C"), q2 = c("B", "B", "A"))

I wish to create a function which takes a column name as an argument, and recodes that column so that any answer "A" becomes an NA and the df is otherwise returned as is. The reason for designing it this way is to fit into a broader pipeline that performs a series of operations using a given column.

There are many ways to do this. But I am interested in understanding what the best idiomatic tidy_eval/tidyverse approach would be. First, the question name needs to be on the left hand side of a mutate verb, so we use the !! and := operators appropriately. But then, what to put on the right hand side?

fix_question <- function(df, question) {
    df %>% mutate(!!question := recode(... something goes here...))
}

fix_question(sample_df, "q1") # should produce a tibble whose first column is (NA, "B", "C")

My initial thought was that this would work:

df %>% mutate(!!question := recode(!!question, "A" = NA_character_))

But of course the bang-bang on inside the function just returns the literal character string (e.g. "q1"). I ended up taking what feels like a hacky route to reference the data on the right hand side, using the base R [[ operator and relying on the . construct from dplyr, and it works, so in a sense I have solved my underlying problem:

df %>% mutate(!!question := recode(.[[question]], "A" = NA_character_))

I'm interested in getting feedback from people who are very good at tidyeval as to whether there is a more idiomatic way to do this, in hopes that seeing a worked example would enhance my understanding of the tidyeval function set more generally. Any thoughts?


Solution

  • Here, on the right side of :=, we can specify sym to convert to symbol and then evaluate (!!)

    fix_question <- function(df, question) {
        df %>%
           mutate(!!question := recode(!! rlang::sym(question), "A" = NA_character_))
      }
    
    fix_question(sample_df, "q1") 
    # A tibble: 3 x 2
    #  q1    q2   
    #  <chr> <chr>
    #1 <NA>  B    
    #2 B     B    
    #3 C     A    
    

    A better approach that would work for both quoted and unquoted input is ensym

    fix_question <- function(df, question) {
        question <- ensym(question)
        df %>%
           mutate(!!question := recode(!! question, "A" = NA_character_))
      }
    
    
    fix_question(sample_df, q1)
    # A tibble: 3 x 2
    #  q1    q2   
    #  <chr> <chr>
    #1 <NA>  B    
    #2 B     B    
    #3 C     A    
    
    fix_question(sample_df, "q1")
    # A tibble: 3 x 2
    #  q1    q2   
    #  <chr> <chr>
    #1 <NA>  B    
    #2 B     B    
    #3 C     A