I stumbled upon this behaviour and do not quite understand it. Could someone, please, shed some light?
I have written the following function which gives the following error:
> MyFilter <- function(data, filtersVector) {
filtersVector <- quo(filtersVector)
result <- data %>% filter(Species %in% !!filtersVector)
result
}
> MyFilter(iris, c("setosa", "virginica"))
Error in filter_impl(.data, quo) :
Evaluation error: 'match' requires vector arguments.
However, if I modify it in the following way it is working as expected:
> MyFilter <- function(data, filtersVector) {
otherName <- quo(filtersVector)
result <- data %>% filter(Species %in% !!otherName)
result
}
> MyFilter(iris, c("setosa", "virginica"))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
I realize also that in a function I should be using enqou
instead and it works fine.
> MyFilter <- function(data, filtersVector) {
filtersVector<- enquo(filtersVector)
result <- data %>% filter(Species %in% !!filtersVector)
result
}
> MyFilter(iris, c("setosa", "virginica"))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
However, I am still puzzled by the above behaviour, and any explanation will be appreciated.
TLDR: In the first version, you have created a self-reference (a symbol that points to itself). The other versions work but you actually don't need quosures or capturing arguments here because you are not referring to data frame columns. This also explains why both the quo()
and the enquo()
versions work the same. You can just pass the argument in the normal way, without any quoting, though it's still a good idea to unquote with !!
to avoid any data masking bug.
You can use qq_show()
around the filter()
call to discover the differences in syntax:
MyFilter <- function(data, filtersVector) {
filtersVector <- quo(filtersVector)
rlang::qq_show(
result <- data %>% filter(Species %in% !!filtersVector)
)
}
MyFilter(iris, c("setosa", "virginica"))
#> result <- data %>% filter(Species %in% (^filtersVector))
So here we are asking filter()
to find the rows where Species
matches the elements of filtersVector
. There is no filtersVector
column in your data frame, so it looks for a definition in the quosure environment. You have created a quosure with quo()
, which records your expression (in this case a symbol filtersVector
) and your envionment (the environment of your function). So it looks up for a filtersVector
object, which contains a symbol referring to itself. It is evaluated only once so there is no infinite loop, but you're effectively trying to compare a vector to a symbol, which is a type error:
"setosa" %in% quote(filtersVector)
#> Error in match(x, table, nomatch = 0L) :
#> 'match' requires vector arguments
In your second try, you give another name to the quosure. It now works because filtersVector
, in the environment of your function, still represent the argument that was passed to it (a vector).
In the third try, you use enquo()
this time. Rather than capturing your expression and your environment, enquo()
captures the expression and the environment of the user of your function. Let's use qq_show()
again to see the difference:
MyFilter <- function(data, filtersVector) {
filtersVector<- enquo(filtersVector)
rlang::qq_show(
data %>% filter(Species %in% !!filtersVector)
)
}
MyFilter(iris, c("setosa", "virginica"))
#> data %>% filter(Species %in% (^c("setosa", "virginica")))
Now, the quosure contains a call that creates a vector on the spot, which %in%
understands perfectly.
Note how you're not actually referring to data frame columns though. You're passing vectors. This means you don't need any quosure at all, and you don't need to capture the expression passed to an argument. enquo()
is only useful to delay evaluation until the very end, so it can be evaluated within the data frame. If the quo()
and enquo()
versions produce teh same result, that's a good indication you don't need any quoting at all. Since there is no need for them, let's simplify the function by removing quosures of the equation:
MyFilter <- function(data, filtersVector) {
data %>% filter(Species %in% filtersVector)
}
MyFilter(iris, c("setosa", "virginica"))
#> # A tibble: 100 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 90 more rows
It works! But what happens if the data frame contains a filtersVector
column though? It'd have precedence over the object from the environment:
iris %>%
mutate(filtersVector = "parasite vector") %>%
MyFilter(c("setosa", "virginica"))
#> # A tibble: 0 x 6
#> # ... with 6 variables: Sepal.Length <dbl>, Sepal.Width <dbl>,
#> # Petal.Length <dbl>, Petal.Width <dbl>, Species <fct>, filtersVector <chr>
So it's still a good idea to unquote, because that will evaluate the vector right away and stick it inside the filter expression. It can no longer be masked by a column. The inlining is shown by qq_show()
:
MyFilter <- function(data, filtersVector) {
rlang::qq_show(
data %>% filter(Species %in% !!filtersVector)
)
}
MyFilter(iris2, c("setosa", "virginica"))
#> data %>% filter(Species %in% <chr: "setosa", "virginica">)