Search code examples
rshinydbplyr

Conditional filter of dbplyr query in Shiny app


I'm working on a Shiny application that queries a database using dbplyr but I'm struggling to implement my usual dplyr approach for conditional filters using the proper syntax. It seems like the underlying issue is that dbplyr does not allow you to evaluate an externally-supplied vector, so is.null fails when the user supplies a vector of potential options.

<SQL>
SELECT *
FROM `mtcars`
WHERE (CASE WHEN (((4.0, 6.0) IS NULL)) THEN 1 WHEN (`cyl` IN (4.0, 6.0)) THEN 1 END)

Ultimately, there will be many parameters to evaluate so I don't want to simply put the whole query in an if/else statement as has been suggested in similar SO questions. Any suggestions for how to best implement conditional filters in a Shiny application built on dbplyr?

# load libraries
library(dplyr)
library(dbplyr)

# create database and dataframe
mtcars_db <- tbl_memdb(mtcars)
mtcars_df <- collect(mtcars_db)

# parameterized filter function
filter_function <- function(data, user_selection) {
  {{data}} |>
    filter(
      case_when(
        is.null({{user_selection}}) ~ TRUE,
        cyl %in% {{user_selection}} ~ TRUE
      )
    )
}

# save vector of user selections
cylinders <- c(4, 6)

# works with dataframes
filter_function(mtcars_df, NULL) # works
filter_function(mtcars_df, cylinders) # works
filter_function(mtcars_db, NULL) # works
filter_function(mtcars_db, cylinders) # fails

# show query of failing version
filter_function(mtcars_db, cylinders) |>
  show_query()

Solution

  • I typically recommend the approach by @langtang, but with multiple conditions it is clear this can become impractical.

    The cause of the problem is that SQL does not have a way to test whether an entire list (e.g. (4.0, 6.0)) is null. Hence, one solution is to test only the first value of the list:

    filter_function <- function(data, user_selection) {
      data %>%
        filter(
          cyl %in% user_selection | is.null(local(user_selection[1]))
          )
    }
    

    I have used OR (|) instead of case_when and removed the {{ }} as it works without. But this is only a style choice.

    The key part of this solution is local, which forces evaluation of its contents prior to translation. If we do not include local then dbplyr is trying to translate something like (4.0, 6.0)[1] (which it can not do). But when we use local, dbplyr just have to translate 4 (which is trivial).

    This makes use of the R behavior that NULL[1] is also NULL.

    We have to assume that user_selection never has NULL as a first value. This should generally be the case.