Search code examples
rdbplyr

how filter() a db-backed tibble using a variable?


I am attempting to filter a database-backed tibble by piping to a filter() command and am observing unexpected behavior:

If I filter using filter(pos == variable), I get the same result regardless of what value I assign to the variable. However, filtering with the value, e.g. filter(pos == 12345) works - the result changes as it should for each different value I'm filtering on.

Is this an aspect of lazy evaluation or tidyeval? What is the correct way to filter() a DB-backed tibble using a variable?

Here's a reproducible example:

library(dplyr)

con <- DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")

ex_data <- tibble(
  pos = c(10510138, 10510507),
  ref = c("CATA", "TCA"),
  alt = c("C", "T")
)

copy_to(con, ex_data, "variants", temporary = FALSE)

toQueryDB <- tbl(con, "variants")

pos = 10510138
(result <- toQueryDB %>% filter(pos == pos)  %>% select(pos, ref, alt) %>% head(1))
# 10510138 CATA  C 

pos = 10510507
(result <- toQueryDB %>% filter(pos == pos)  %>% select(pos, ref, alt) %>% head(1))
# STILL 10510138 CATA  C !!!

(result <- toQueryDB %>% filter(pos == 10510138)  %>% select(pos, ref, alt) %>% head(1))
# 10510138 CATA  C

(result <- toQueryDB %>% filter(pos == 10510507)  %>% select(pos, ref, alt) %>% head(1))
# 10510507 TCA   T 

DBI::dbDisconnect(con)

And my session info:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.7.6

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18     rstudioapi_0.7   bindr_0.1.1      magrittr_1.5     tidyselect_0.2.4 bit_1.1-14       R6_2.2.2        
 [8] rlang_0.2.2      fansi_0.3.0      blob_1.1.1       tools_3.5.1      utf8_1.1.4       cli_1.0.1        DBI_1.0.0       
[15] dbplyr_1.2.2     yaml_2.2.0       bit64_0.9-7      assertthat_0.2.0 digest_0.6.17    tibble_1.4.2     crayon_1.3.4    
[22] bindrcpp_0.2.2   purrr_0.2.5      memoise_1.1.0    glue_1.3.0       RSQLite_2.1.1    compiler_3.5.1   pillar_1.3.0    
[29] pkgconfig_2.0.2 

Solution

  • When you have

    filter(pos == pos)
    

    dplyr doesn't try to figure out which pos is which and it assumes they both are coming from the data that's piped into the function. If you want to inject the value of a variable from outside the data being passed in, then you need to use the bang-bang rlang operator (!!). You should use

    filter(pos == !!pos)