I use a dynamic variable (eg. ID
) as a way to reference a column name that will change depending on which gene I am processing at the time. I then use case_when
within mutate
to create a new column that will have values that depend on the dynamic column.
I thought that !!
(bang-bang) was what I needed to force eval of the content of the variable; however, I did not get the expected output in my new column. Only the !!as.name
gave me the output I was expecting, and I do not fully understand why. Could someone explain why in this case using only !!
isn't appropriate and what is happening in !!as.name
?
Here is a simple reproducible example that I made up to demo what I am experiencing:
library(tidyverse)
ID <- "birth_year"
# Correct output
test <- starwars %>%
mutate(FootballLeague = case_when(
!!as.name(ID) < 10 ~ "U10",
!!as.name(ID) >= 10 & !!as.name(ID) < 50 ~ "U50",
!!as.name(ID) >= 50 & !!as.name(ID) < 100 ~ "U100",
!!as.name(ID) >= 100 ~ "Senior",
TRUE ~ "Others"
))
# Incorrect output
test2 <- starwars %>%
mutate(FootballLeague = case_when(
!!(ID) < 10 ~ "U10",
!!(ID) >= 10 & !!(ID) < 50 ~ "U50",
!!(ID) >= 50 & !!(ID) < 100 ~ "U100",
!!(ID) >= 100 ~ "Senior",
TRUE ~ "Others"
))
# Incorrect output
test3 <- starwars %>%
mutate(FootballLeague = case_when(
as.name(ID) < 10 ~ "U10",
as.name(ID) >= 10 & as.name(ID) < 50 ~ "U50",
as.name(ID) >= 50 & as.name(ID) < 100 ~ "U100",
as.name(ID) >= 100 ~ "Senior",
TRUE ~ "Others"
))
identical(test, test2)
# FALSE
identical(test2, test3)
# TRUE
sessionInfo()
#R version 4.0.2 (2020-06-22)
#Platform: x86_64-centos7-linux-gnu (64-bit)
#Running under: CentOS Linux 7 (Core)
# tidyverse_1.3.0
# dplyr_1.0.2
Cheers!
You can wrap your expressions in the function quo()
to see the result of the operation after applying the !!
operator. For simplicity I will use a shorter expression for demonstration:
Preparations:
library(tidyverse)
ID <- "birth_year"
## Test without quasiquotation:
starwars %>%
filter(birth_year < 50)
Experiment 1:
quo(
starwars %>%
filter(ID < 50)
)
## result: starwars %>% filter(ID < 50)
We learn: filter()
does not treat ID
as variable, but "as is". So we need a mechanism to tell filter()
that it should treat ID
as variable, and it should use its value.
--> The !!
operator can be used to tell filter()
it should treat an expression as variable and substitute its value.
Experiment 2:
quo(
starwars %>%
filter(!!ID < 50)
)
## result: starwars %>% filter("birth_year" < 50)
We learn: The !!
operator has indeed worked: ID
was replaced with its value. But: The value of ID
is the string "birth_year"
. Note the quotes in the result. But as you probably know, tidyverse functions don't take variable names as strings, they want the raw names, without quotes. Compare with Experiment 1: filter()
takes everything "as is", so it looks for a column named "birth_year"
(including the quotes!)
What does the function as.name()
do?
This is a base R fuction that takes a string (or a variable containing a string) and returns the content of the string as variable name.
So if you call as.name(ID)
in base R, the result is birth_year
, this time without quotes - just like the tidyverse expects it. So let's try it:
Experiment 3:
quo(
starwars %>%
filter(as.name(ID) < 50)
)
## result: starwars %>% filter(as.name(ID) < 50)
We learn: This did not work, because, again, filter()
takes everything "as is". So now it looks for column named as.name(ID)
, which does of course not exist.
--> We need to combine the two things to make it work:
as.name()
to convert the string to a variable name.!!
to tell filter()
it should not take things "as is", but substitute the real value.Experiment 4:
quo(
starwars %>%
filter(!!as.name(ID) < 50)
)
## result: starwars %>% filter(birth_year < 50)
Now it works! :)
I have used filter()
in my experiments, but it works exactly the same with mutate()
and other tidyverse functions.