Search code examples
rdplyrrpostgresql

Case-sensitivity with dplyr + RPostgreSQL string matching


I can't figure out how to apply a non-case-sensitive filter query to a remote PostgreSQL table using dplyr. To demonstrate:

require(dplyr)
require(stringr)
require(RPostgreSQL)

drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname="mydb", host="localhost", port=5432, user="username")

# create db table
copy_to(con, iris, "iris", temporary = FALSE)

# dplyr remote database table
iris_pg <- tbl(con, "iris")

iris_pg %>% filter(str_detect(Species, 'setosa')) %>% head(3) %>% collect()
# A tibble: 3 x 5
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
*        <dbl>       <dbl>        <dbl>       <dbl> <chr>  
1          5.1         3.5          1.4         0.2 setosa 
2          4.9         3            1.4         0.2 setosa 
3          4.7         3.2          1.3         0.2 setosa

iris_pg %>% filter(str_detect(Species, 'Setosa')) %>% head(3) %>% collect()
# A tibble: 0 x 0

To ignore case stringr::fixed('Setosa', ignore_case=TRUE) works with tibble filtering. But with the postgres table it has no effect:

iris_pg %>% filter(str_detect(Species, stringr::fixed('SETOSA', ignore_case=TRUE))) %>% head(3) %>% collect()
# A tibble: 0 x 0

Is anyone aware of a workaround?


Solution

  • It does not work because as you can see here, when using a PostgreSQL backend, dbplyr relies on the case-sensitive function STRPOS to translate str_detect into SQL.

    Some possible workarounds:

    1) filter(str_detect(tolower(myvar), tolower(pattern))) probably works with any relational database.

    2) filter(myvar %~*% pattern) relies on ~*, the PostgreSQL operator for case-insensitive POSIX regexes.

    3) filter(myvar %ilike% paste0("%", pattern, "%")) relies on ILIKE, a case-insensitive and Postgres-specific version of the standard LIKE operator.