Search code examples
rwindowssqlitesqldfrsqlite

enabling SQLite's generate_sequence to be used in R's sqldf library under MS windows


I am hoping to use SQLite3's generate_series Table-Valued Function from within R's sqldf library, which in turn uses the RSQLite library. I expect once installed/configured, I will be able to call it as:

sqldf('SELECT value FROM generate_series(5,100,5)')

From reading Run-Time Loadable Extensions, I understand I will need to first call load_extension(X), specifically as

sqldf('select load_extension("C:\\my\\path\\to\\sqlite\\pathext\\misc\\series.c")')

However, I also read that Extension loading is off by default, and I can not figure out how I will have to turn it on from within a sqldf call. Is this possible at all? Will I have to compile my own sqlite3 with this default toggled?

Further, I would like to confirm that there is no shortcut to Compiling A Loadable Extension myself with MinGW, because apparently without compiling ext/misc/series.c, I will find that SQLite generate_series is missing, and none of the Downloads provide it compiled.

If I understand correctly, I will NOT have to compile all of SQLite, I can use the downloaded version in conjunction with the extension compiled by me.

Finally, how do I assure that the version of sqlite3 that sqldf / RSQLite picks up is the one I've configured? Will this force me to compile my own RSQLite rather than simply install.packages(c('RSQLite'))?

All this seems like a lot of work just for the sake of this one extension. Is there a better way to go?

Thanks for your helps!


Solution

  • Suggest you submit an issue on the RSQLite github site for it to be added but in the meantime here are some workarounds that don't involve messing with rebuilding SQLite and RSQLite.

    1) Recursive CTE This can be done without the extension using a CTE as shown in https://www.sqlite.org/series.html .

    To be specific try this sqldf code:

    library(sqlite)
    
    start <- 1
    end <- 3
    step <- 1
    fn$sqldf("
      WITH RECURSIVE generate_series(value) AS (
        SELECT $start
        UNION ALL
        SELECT value+$step FROM generate_series
         WHERE value+$step<=$end
    ) SELECT value FROM generate_series
    ")
    ##   value
    ## 1     1
    ## 2     2
    ## 3     3
    

    2) Non-recursive CTE Suppose we have a data frame known to have at least the required number of rows. It's content does not otherwise matter. For example, the built in BOD data frame has 6 rows and suppose we want 3 rows which is no more than 6 so it will work.

    fn$sqldf("
      WITH generate_series(value) AS (
        SELECT $start + $step * (rowid - 1) FROM BOD LIMIT ($end - $start) / $step + 1
      ) SELECT value FROM generate_series
    ")
    ##   value
    ## 1     1
    ## 2     2
    ## 3     3
    

    If start and step are 1 this can be simplified to:

    fn$sqldf("
      WITH generate_series(value) AS (
        SELECT rowid FROM BOD LIMIT $end
      ) SELECT value FROM generate_series
    ")
    

    3) Create in R Another possibility is to create the series in R and then just use it in sqldf:

    generate_series <- data.frame(value = seq(start, end, step))
    sqldf("SELECT value FROM generate_series")
    ##   value
    ## 1     1
    ## 2     2
    ## 3     3