Search code examples
rdbplyrduckdb

saving and reading a duckdb database in R


I'm learning more about duckdb and exploring the various ways to "save" a database. I understand I should be able to save a duckdb database however when I try to save it, I get an error.

Error: unable to find an inherited method for function ‘dbWriteTable’ for signature ‘conn = "duckdb_connection", name = "character", value = "tbl_duckdb_connection"’

Ultimately, I am experimenting with the best ways to save and load data to duckdb by trying to save a database itself (not just the csv or parquet files).

If you have any thoughts on the benefits or challenges of this approach vs. csv /parquet please let me know.

After saving the database, I'd also like to load it as well (which I give the code for but sure if it works because I can't save it)

Lastly, I created the duckdb connection via a temp file which I understand will create a persistence file and allow me to work out of memory vs. the regular duckdb() which limits me to in memory (https://josiahparry.com/posts/2024-05-24-duckdb-and-r)

Any suggestions?

library(tidyverse)
library(duckdb)

# this works -- but I have limited understanding of the implications between duckdb() only vs. the below solution

drv <- duckdb(tempfile(fileext = ".duckdb"))

# this works
con <- dbConnect(drv)

# this works
duckdb_register(con,"diamonds",diamonds,overwrite = TRUE)

# this works
diamonds_db <- dplyr::tbl(con, sql("select * from diamonds"))

# this doesn't work -- I'm trying to save this as duckdb database object (as oppose to a csv file or parquet file) -- I am assuming this is better saving format (eg. lighter?)

duckdb::dbWriteTable(
  con
  ,"data/diamonds.duckdb"
  ,diamonds_db
  ,append = TRUE)
)

Ideally if I can figure out how to save it, I can also read from it with what I assume will work as below

library(tidyverse)
library(duckdb)

con <- dbConnect(
  duckdb::duckdb(),
  dbdir = "data/diamonds.duckdb"
)

# I don't know if this works
df <- dbReadTable(con, "diamonds")

Solution

  • thanks to @G. Grothendieck, the problem was my approach is two fold:

    1. I was using a temporary file for the database, so even if I was successful in saving a table to it, the database itself wouldn't save
    2. I was using dudckdb::duckdb_register() which creates a temporary table (that you can use and collect) but won't actually save it to database.

    Below is the corrected code

    library(tidyverse)
    library(duckdb)
    
    drv <- duckdb(dbdir ="database.duckdb")
    
    
    con <- dbConnect(drv)
    
    
    duckdb::dbWriteTable(
      con
      ,"diamonds.dbi"
      ,diamonds
      ,append = TRUE)
    )
    

    This will successfully save your table your database.

    If you quite and start a new session, simply reconnect to your database and then use dplyr::tbl() to access it (no need to read csv files).

    
    drv <- duckdb(dbdir ="database.duckdb")
    
    
    con <- dbConnect(drv)
    
    ## this loads your table from your persistance database
    
    tbl(con,sql("select * from 'diamonds.dbi'"))
    

    Ensure you wrap reference to your table with ' or it won't work.