Search code examples
rdplyrtidyversetidyevalnse

Why does !! (bang-bang) combined with as.name() give a different output compared to !! or as.name() alone?


I use a dynamic variable (eg. ID) as a way to reference a column name that will change depending on which gene I am processing at the time. I then use case_when within mutate to create a new column that will have values that depend on the dynamic column.

I thought that !! (bang-bang) was what I needed to force eval of the content of the variable; however, I did not get the expected output in my new column. Only the !!as.name gave me the output I was expecting, and I do not fully understand why. Could someone explain why in this case using only !! isn't appropriate and what is happening in !!as.name?

Here is a simple reproducible example that I made up to demo what I am experiencing:

library(tidyverse)

ID <- "birth_year"

# Correct output
test <- starwars %>%
  mutate(FootballLeague = case_when(
    !!as.name(ID) < 10 ~ "U10",
    !!as.name(ID) >= 10 & !!as.name(ID) < 50 ~ "U50",
    !!as.name(ID) >= 50 & !!as.name(ID) < 100 ~ "U100",
    !!as.name(ID) >= 100 ~ "Senior",
    TRUE ~ "Others"
  ))

# Incorrect output
test2 <- starwars %>%
  mutate(FootballLeague = case_when(
    !!(ID) < 10 ~ "U10",
    !!(ID) >= 10 & !!(ID) < 50 ~ "U50",
    !!(ID) >= 50 & !!(ID) < 100 ~ "U100",
    !!(ID) >= 100 ~ "Senior",
    TRUE ~ "Others"
  ))

# Incorrect output
test3 <- starwars %>%
  mutate(FootballLeague = case_when(
    as.name(ID) < 10 ~ "U10",
    as.name(ID) >= 10 & as.name(ID) < 50 ~ "U50",
    as.name(ID) >= 50 & as.name(ID) < 100 ~ "U100",
    as.name(ID) >= 100 ~ "Senior",
    TRUE ~ "Others"
  ))

identical(test, test2)
# FALSE

identical(test2, test3)
# TRUE

sessionInfo()
#R version 4.0.2 (2020-06-22)
#Platform: x86_64-centos7-linux-gnu (64-bit)
#Running under: CentOS Linux 7 (Core)

# tidyverse_1.3.0
# dplyr_1.0.2

Cheers!


Solution

  • You can wrap your expressions in the function quo() to see the result of the operation after applying the !! operator. For simplicity I will use a shorter expression for demonstration:

    Preparations:

    library(tidyverse)
    ID <- "birth_year"
    
    ## Test without quasiquotation:
    starwars %>% 
      filter(birth_year < 50)
    

    Experiment 1:

    quo(
      starwars %>% 
        filter(ID < 50)
    )
    ## result: starwars %>% filter(ID < 50)
    

    We learn: filter() does not treat ID as variable, but "as is". So we need a mechanism to tell filter() that it should treat ID as variable, and it should use its value.

    --> The !! operator can be used to tell filter() it should treat an expression as variable and substitute its value.

    Experiment 2:

    quo(
      starwars %>% 
        filter(!!ID < 50)
    ) 
    ## result: starwars %>% filter("birth_year" < 50)
    

    We learn: The !! operator has indeed worked: ID was replaced with its value. But: The value of ID is the string "birth_year". Note the quotes in the result. But as you probably know, tidyverse functions don't take variable names as strings, they want the raw names, without quotes. Compare with Experiment 1: filter() takes everything "as is", so it looks for a column named "birth_year" (including the quotes!)

    What does the function as.name() do?

    This is a base R fuction that takes a string (or a variable containing a string) and returns the content of the string as variable name. So if you call as.name(ID) in base R, the result is birth_year, this time without quotes - just like the tidyverse expects it. So let's try it:

    Experiment 3:

    quo(
      starwars %>% 
        filter(as.name(ID) < 50)
    ) 
    ## result: starwars %>% filter(as.name(ID) < 50)
    

    We learn: This did not work, because, again, filter() takes everything "as is". So now it looks for column named as.name(ID), which does of course not exist.

    --> We need to combine the two things to make it work:

    1. Use as.name() to convert the string to a variable name.
    2. Use !! to tell filter() it should not take things "as is", but substitute the real value.

    Experiment 4:

    quo(
      starwars %>% 
        filter(!!as.name(ID) < 50)
    ) 
    ## result: starwars %>% filter(birth_year < 50)
    

    Now it works! :)

    I have used filter() in my experiments, but it works exactly the same with mutate() and other tidyverse functions.