I am using sqldf
to train some r-users in SQL. Loading RPostgreSQL
or RH2
before loading sqldf
will change the default SQL used by sqldf
. This works fine with h2, but every time I load RPostgreSQL
, R will crash on first attempted query. I want to make sqldf
work with RPostgreSQL
to enable use of date functions which are lacking in SQLite
and h2
.
# packages
library(tidyverse)
# library(RH2) # comment out this row or the next to determine default SQL for sqldf
library(RPostgreSQL)
library(sqldf)
Output confirms that "sqldf will default to using PostgreSQL"
Create some data:
set.seed(42)
N <- 1e6
sales <- tibble(
buyer_id = 1:N/100,
sales_date = sample(seq(as.Date('2018/01/01'), as.Date('2021/01/01'), by="day"), N, replace = TRUE),
sales_amount = rpois(N, 200)
)
Crash R:
sqldf("
select
max(sales_date)
from sales
")
"R Session Aborted R encountered a fatal error The session was terminated"
If the only reason not to use H2 is date_trunc
then here are some ways around that.
1) Macros Here is a workaround for truncation to the beginning of year/quarter/month/week. These functions act as macros which expand into code accepted by H2. Be sure to prefix sqldf with fn$ and surround each with back quotes to turn on substitution. Note that in each case the argument is a string. Add the verbose=TRUE
argument to sqldf to see the generated code.
library(RH2)
library(sqldf)
trunc_year <- function(x) sprintf("DATEADD(day, 1-day_of_year(%s), %s)", x, x)
trunc_qtr <- function(x) sprintf("DATEADD(month, 3*(quarter(%s)-1), %s)", x,
trunc_year(x))
trunc_month <- function(x) sprintf("DATEADD(day, 1-day_of_month(%s), %s)", x, x)
trunc_week <- function(x) sprintf("DATEADD(day, -iso_day_of_week(%s), %s)", x, x)
# test
DF <- data.frame(x = as.Date("2021-11-15"))
fn$sqldf("select x,
`trunc_year('x')` year,
`trunc_qtr('x')` qtr,
`trunc_month('x')` month,
`trunc_week('x')` week
from DF")
## x year qtr month week
## 1 2021-11-15 2021-01-01 2021-10-01 2021-11-01 2021-11-14
2) Patch RH2 Another possibility is to patch your installation of RH2 with the newer version of H2 which has date_trunc
. To do this below we remove the the h2-1.3.175.jar file in RH2, which contains H2, and replace it with the newer version h2-1.4.200.jar. This should work as long as you have write permission in the java subdirectory of your RH2 installation. Just run this code in R and your RH2 installation will be patched.
u <- "https://h2database.com/h2-2019-10-14.zip"
z <- basename(u)
tmp <- tempdir() # create temporary dir
old.dir <- setwd(tmp) # go to it
download.file(u, z) # download u
unzip(z, "h2/bin/h2-1.4.200.jar") # extract jar from zip file
# copy new jar file to RH2 installation and remove old one
new.jar <- file.path(tmp, "h2", "bin", "h2-1.4.200.jar")
old.jar <- dir(system.file("java", package = "RH2"), "h2.*jar$", full.names = TRUE)
if (basename(old.jar) != basename(new.jar)) {
file.copy(new.jar, dirname(old.jar))
file.remove(old.jar)
}
setwd(old.dir) # return to original directory
# test
library(RH2)
library(sqldf)
sqldf("select h2version()")
## '1.4.200'
## 1 1.4.200
# see more tests in (3) below
3) Rebuild RH2 Another possibility is create a new source version of RH2 with the newer version of H2 that has date_trunc
. The first argument is 'year', 'quarter', 'month' or 'week' and the second argument is a date. Thus, if you are willing to rebuild RH2 with this new H2 version then it is available.
(a) Download the source of RH2 from https://cran.r-project.org/package=RH2 and detar it to create a directory tree RH2.
(b) Download this h2 zip https://h2database.com/h2-2019-10-14.zip and extract h2/bin/h2-1.4.200.jar from that zip file and place the jar file in the RH2/inst/java directory of the RH2 source removing the old one, h2-1.3.175.jar
(c) rebuild and test RH2 like this:
# rebuild RH2
# cd to the RH2 directory containing its DESCRIPTION file
setwd("...whatever.../RH2")
# build RH2
# on Windows this needs Rtools40 (not an R package)
# https://cran.r-project.org/bin/windows/Rtools/
library(devtools)
build()
install(args = "--no-multiarch")
# test
library(RH2)
library(sqldf)
sqldf("select h2version()") # check it's the latest version
## '1.4.200'
## 1 1.4.200
DF <- data.frame(x = as.Date("2000-11-15"))
sqldf("select date_trunc('month', x) from DF")
## DATE_TRUNC('month', x)
## 1 2000-11-01