Search code examples
rsqldfrpostgresql

Using RpostgreSQL with sqldf crashes R


I am using sqldf to train some r-users in SQL. Loading RPostgreSQL or RH2 before loading sqldf will change the default SQL used by sqldf. This works fine with h2, but every time I load RPostgreSQL, R will crash on first attempted query. I want to make sqldf work with RPostgreSQL to enable use of date functions which are lacking in SQLite and h2.

# packages
library(tidyverse)
# library(RH2) # comment out this row or the next to determine default SQL for sqldf
library(RPostgreSQL) 
library(sqldf)

Output confirms that "sqldf will default to using PostgreSQL"

Create some data:

set.seed(42)
N <- 1e6

sales <- tibble(
    buyer_id = 1:N/100,
    sales_date = sample(seq(as.Date('2018/01/01'), as.Date('2021/01/01'), by="day"), N, replace = TRUE),
    sales_amount = rpois(N, 200)
) 

Crash R:

sqldf("
select 
    max(sales_date) 
from sales
")

"R Session Aborted R encountered a fatal error The session was terminated"


Solution

  • If the only reason not to use H2 is date_trunc then here are some ways around that.

    1) Macros Here is a workaround for truncation to the beginning of year/quarter/month/week. These functions act as macros which expand into code accepted by H2. Be sure to prefix sqldf with fn$ and surround each with back quotes to turn on substitution. Note that in each case the argument is a string. Add the verbose=TRUE argument to sqldf to see the generated code.

    library(RH2)
    library(sqldf)
    
    trunc_year <- function(x) sprintf("DATEADD(day, 1-day_of_year(%s), %s)", x, x)
    trunc_qtr <- function(x) sprintf("DATEADD(month, 3*(quarter(%s)-1), %s)", x, 
      trunc_year(x))
    trunc_month <- function(x) sprintf("DATEADD(day, 1-day_of_month(%s), %s)", x, x)
    trunc_week <- function(x) sprintf("DATEADD(day, -iso_day_of_week(%s), %s)", x, x)
    
    # test
    DF <- data.frame(x = as.Date("2021-11-15"))
    fn$sqldf("select x, 
                    `trunc_year('x')` year, 
                    `trunc_qtr('x')` qtr,
                    `trunc_month('x')` month,
                    `trunc_week('x')` week
              from DF")
    ##            x       year        qtr      month       week
    ## 1 2021-11-15 2021-01-01 2021-10-01 2021-11-01 2021-11-14
    

    2) Patch RH2 Another possibility is to patch your installation of RH2 with the newer version of H2 which has date_trunc. To do this below we remove the the h2-1.3.175.jar file in RH2, which contains H2, and replace it with the newer version h2-1.4.200.jar. This should work as long as you have write permission in the java subdirectory of your RH2 installation. Just run this code in R and your RH2 installation will be patched.

    u <- "https://h2database.com/h2-2019-10-14.zip"
    z <- basename(u)
    tmp <- tempdir()  # create temporary dir 
    old.dir <- setwd(tmp)  # go to it
    download.file(u, z)  # download u
    unzip(z, "h2/bin/h2-1.4.200.jar")  # extract jar from zip file
    
    # copy new jar file to RH2 installation and remove old one
    new.jar <- file.path(tmp, "h2", "bin", "h2-1.4.200.jar")
    old.jar <- dir(system.file("java", package = "RH2"), "h2.*jar$", full.names = TRUE)
    if (basename(old.jar) != basename(new.jar)) {
      file.copy(new.jar, dirname(old.jar))
      file.remove(old.jar)
    }
    setwd(old.dir)  # return to original directory
    
    # test
    library(RH2)
    library(sqldf)
    sqldf("select h2version()")
    ##   '1.4.200'
    ## 1   1.4.200
    # see more tests in (3) below
    

    3) Rebuild RH2 Another possibility is create a new source version of RH2 with the newer version of H2 that has date_trunc. The first argument is 'year', 'quarter', 'month' or 'week' and the second argument is a date. Thus, if you are willing to rebuild RH2 with this new H2 version then it is available.

    (a) Download the source of RH2 from https://cran.r-project.org/package=RH2 and detar it to create a directory tree RH2.

    (b) Download this h2 zip https://h2database.com/h2-2019-10-14.zip and extract h2/bin/h2-1.4.200.jar from that zip file and place the jar file in the RH2/inst/java directory of the RH2 source removing the old one, h2-1.3.175.jar

    (c) rebuild and test RH2 like this:

    # rebuild RH2
    # cd to the RH2 directory containing its DESCRIPTION file
    setwd("...whatever.../RH2")
    
    # build RH2 
    #  on Windows this needs Rtools40 (not an R package)
    #  https://cran.r-project.org/bin/windows/Rtools/
    library(devtools)
    build()
    install(args = "--no-multiarch")
    
    # test
    library(RH2)
    library(sqldf)
    sqldf("select h2version()") # check it's the latest version
    ##   '1.4.200'
    ## 1   1.4.200
    
    DF <- data.frame(x = as.Date("2000-11-15"))
    sqldf("select date_trunc('month', x) from DF")
    ##   DATE_TRUNC('month', x)
    ## 1             2000-11-01