Search code examples
rdplyrgoogle-bigquerydbplyrbigrquery

bigrquery + dplyr error when using functions and sourcing from separate R script


I'm writing a set of common functions to QC tables in a BigQuery database using the dplyr interface, which I'm source()-ing to a series of table-specific RMD files for documentation. I'm not sure how to create a reproducible example since this relates to a DB connection, but here's what I'm seeing and let me know what other info would be useful -

I'm able to run all of the relevant code within the RMD file itself/not as a function just fine, but when I move them to a separate script and convert them to functions I keep running into this error:

Error in `table$variable`:
! The `$` method of <tbl_lazy> is for internal use only.
ℹ Use `dplyr::pull()` to get the values in a column.

Here's an example of the dplyr code (they're all fairly simple functions):

count_records_by_group <- function(table, variable) {
  table %>%
    group_by(variable) %>%
    summarise(count = n()) %>%
    data.frame()
}

I'm also able to run the function successfully if I hardcode a variable name, like this:

count_records_by_group <- function(table) {
  table %>%
    group_by(year) %>%
    summarise(count = n()) %>%
    data.frame()
}

But obviously the point of the function is to allow for different input variables so it doesn't make sense to do that.

The only discussion of this error I can find is from this issue, which looks like it was resolved through some bug fixes on specific packages. The workaround that I've been able use is loading the entire table into memory (not practical for most of the tables I'm working with). I've also tried various methods of entering the variable name (eval(parse("variable")), dplyr::pull(table, variable)). I cant' seem to find another work around or way to force dplyr::pull() over dbplyr::tbl_lazy().

Is this potentially a bug, or is there a workaround that I'm missing? TIA

Package versions:
dplyr: 1.1.4
dbplyr: 2.5.0
bigrquery: 1.5.1


Solution

  • If you want to pass the table name and variable name as function parameter you may try this:

    library(dplyr)
    
    target_table <- data.frame(target_variable = c('A','A','A', 'B', 'B'))
    
    count_records_by_group <- function(table, variable) {
      
      table <- get(table)
      
      table %>%
        group_by(!!sym(variable)) %>%
        summarise(count = n()) %>%
        data.frame()
    }
    
    count_records_by_group('target_table', 'target_variable')
    
    
      target_variable count
    1               A     3
    2               B     2