Search code examples
rdplyrnse

error when using NSE (in dplyr) : object 'value' not found


I'm trying to get familiar with using NSE in my code where warranted. Let's say I have pairs of columns and want to generate a new string variable for each pair indicating whether the values in that pair are the same.

library(tidyverse)
library(magrittr)

df <- tibble(one.x = c(1,2,3,4),
             one.y = c(2,2,4,3),
             two.x = c(5,6,7,8),
             two.y = c(6,7,7,9),
             # not used but also in df
             extra = c(5,5,5,5))

I'm trying to write code that would accomplish the same thing as the following code:

df.mod <- df %>%
  # is one.x the same as one.y?
  mutate(one.x_suffix = case_when( 
    one.x == one.y ~ "same",
    TRUE ~ "different")) %>%
  # is two.x the same as two.y?
  mutate(two.x_suffix = case_when(
    two.x == two.y ~ "same",
    TRUE ~ "different"))

df.mod
#> # A tibble: 4 x 6
#>   one.x one.y two.x two.y one.x_suffix two.x_suffix
#>   <dbl> <dbl> <dbl> <dbl> <chr>        <chr>       
#> 1    1.    2.    5.    6. different    different   
#> 2    2.    2.    6.    7. same         different   
#> 3    3.    4.    7.    7. different    same        
#> 4    4.    3.    8.    9. different    different

In my actual data I have an arbitrary number of such pairs (e.g. three.x and three.y, . . .) so I want to write a more generalized procedure using mutate_at.

My strategy is to pass in the ".x" variables as the .vars and then gsub the "x" for "y" on one side of the equality test inside the case_when, like so:

df.mod <- df %>%
  mutate_at(vars(one.x, two.x),
            funs(suffix = case_when(
              . == !!sym(gsub("x", "y", deparse(substitute(.)))) ~ "same",
              TRUE ~ "different")))
#> Error in mutate_impl(.data, dots): Evaluation error: object 'value' not found.

This is when I get an exception. It looks like the gsub portion is working fine:

df.debug <- df %>%
  mutate_at(vars(one.x, two.x),
            funs(suffix = gsub("x", "y", deparse(substitute(.)))))
df.debug
#> # A tibble: 4 x 6
#>   one.x one.y two.x two.y one.x_suffix two.x_suffix
#>   <dbl> <dbl> <dbl> <dbl> <chr>        <chr>       
#> 1    1.    2.    5.    6. one.y        two.y       
#> 2    2.    2.    6.    7. one.y        two.y       
#> 3    3.    4.    7.    7. one.y        two.y       
#> 4    4.    3.    8.    9. one.y        two.y

It's the !!sym() operation that's causing the exception here. What have I done wrong?

Created on 2018-11-07 by the reprex package (v0.2.1)


Solution

  • The problem is not in !!sym, as you can see in the following example:

    df %>% mutate_at( vars(one.x, two.x),
                      funs(suffix = case_when(
                        . == !!sym("one.y") ~ "same",
                        TRUE ~ "different")))
    # # A tibble: 4 x 6
    #   one.x one.y two.x two.y one.x_suffix two.x_suffix
    #   <dbl> <dbl> <dbl> <dbl> <chr>        <chr>       
    # 1     1     2     5     6 different    different   
    # 2     2     2     6     7 same         different   
    # 3     3     4     7     7 different    different   
    # 4     4     3     8     9 different    different   
    

    The problem is in trying to unquote substitute(.) inside case_when:

    df %>% mutate_at( vars(one.x, two.x),
                      funs(suffix = case_when(
                        . == !!substitute(.) ~ "same",
                        TRUE ~ "different")))
    # Error in mutate_impl(.data, dots) : 
    #   Evaluation error: object 'value' not found.
    

    The reason for this is operator precedence. From the help page for !!:

    The !! operator unquotes its argument. It gets evaluated immediately in the surrounding context.

    In the example above, the context for !!substitute(.) is the formula, which is itself inside case_when. This leads to the expression getting immediately replaced with value, which is defined inside case_when and which has no meaning inside your data frame.

    You want to keep expressions next to their environment, which is what quosures are for. By replacing substitute with rlang::enquo, you capture the expression that gave rise to . along with its defining environment (your dataframe). To keep things tidy, let's move your gsub manipulation into a separate function:

    x2y <- function(.x)
    {
      ## Capture the expression and its environment
      qq <- enquo(.x)
    
      ## Retrieve the expression and deparse it
      txt <- rlang::get_expr(qq) %>% rlang::expr_deparse()
    
      ## Replace x with y, as before
      txty <- gsub("x", "y", txt)
    
      ## Put the new expression back into the quosure
      rlang::set_expr( qq, sym(txty) )
    }
    

    You can now use the new x2y function directly in your code. With quosures, no unquoting is necessary because the expressions already carry their environments with them; you can simply evaluate them using rlang::eval_tidy:

    df %>% mutate_at(vars(one.x, two.x),
                     funs(suffix = case_when(
                       . == rlang::eval_tidy(x2y(.)) ~ "same",
                       TRUE ~ "different" )))
    # # A tibble: 4 x 6
    #   one.x one.y two.x two.y one.x_suffix two.x_suffix
    #   <dbl> <dbl> <dbl> <dbl> <chr>        <chr>       
    # 1     1     2     5     6 different    different   
    # 2     2     2     6     7 same         different   
    # 3     3     4     7     7 different    same        
    # 4     4     3     8     9 different    different   
    

    EDIT to address the question in your comment: Mushing all your code into a single line is almost always A Bad Idea™, and I strongly advise against it. However, since this question is about NSE, I think it's important to understand why simply taking the content of x2y and pasting it inside case_when leads to problems.

    enquo(), like substitute(), look in the calling environment of the function and replace the argument with the expression that was provided to that function. substitute() goes only one environment up (finding value inside case_when when you unquoted it), while enquo() keeps moving up as long as the functions in the calling stack correctly handle quasiquotation. (And most dplyr/tidyverse functions do.) So, when you call enquo(.x) inside x2y, it moves up the expressions provided to each function on the calling stack to eventually find one.x.

    When you call enquo() inside mutate_at, it is now on the same level as one.x, so it too replaces the argument (one.x in this case) with the expression that defined it (the vector c(1,2,3,4) in this case). This is not what you want. Rather than moving up levels, you now want to stay on the same level as one.x. To do so, use rlang::quo() in place of rlang::enquo():

    library( rlang )   ## To maintain at least a little bit of sanity
    
    df %>% 
     mutate_at(vars(one.x, two.x),
       funs(suffix = case_when(
        . == eval_tidy(set_expr(quo(.), 
                                sym(gsub("x","y", expr_deparse(get_expr(quo(.)))))
                           )
                ) ~ "same",
        TRUE ~ "different" )))
    # Now works as expected