Search code examples
rdplyramazon-redshiftrpostgresql

dplyr: use a custom function in summarize() when connected to external database


Is there a way to use custom functions within a summaries statement when using dplyr to pull data from an external database? I can’t make usable dummy data because this is specific to databases, but imagine you have a table with three fields: product, true_positive, and all_positive. This is the code I want to use:

getPrecision <- function(true_positive, all_positive){
  if_else(sum(all_positive, na.rm = TRUE) == 0, 0,
          (sum(true_positive) / sum(all_positive , na.rm = TRUE)))
}

database_data %>%
    group_by(product) %>%
    summarize(precision = getPrecision(true_positive, all_positive)) %>% collect

This is the error: Error in postgresqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not Retrieve the result : ERROR: function getprecision(integer, integer) does not exist


Solution

  • To understand the error message, you could use show_query instead of collect to see the SQL code sent to the database :

    database_data %>%
        group_by(product) %>%
        summarize(precision = getPrecision(true_positive, all_positive)) %>%  
        show_query
    
    <SQL>
    SELECT "product", getPrecision("true_positive", "all_positive") AS "precision"
    FROM "database_table"
    GROUP BY "product"
    

    As you can see, this SQL expects getPrecision function to be available on the server, which is not the case.

    A potential solution is to collect table data first, before applying this function in the R client:

    database_data %>%
        collect %>%
        group_by(product) %>%
        summarize(precision = getPrecision(true_positive, all_positive)) 
    

    If this isn't possible, because the table is too big, you'll have to implement the function in SQL on the server :

    SELECT 
      "product", 
      CASE WHEN sum(all_positive)=0 THEN 0 ELSE sum(true_positive)/sum(all_positive) END AS "precision"
    FROM "database_table"
    GROUP BY "product"