Search code examples
sqlrdplyrdbplyr

R and dplyr: How can I use compute() to create a persistent table from SQL query in a different schema than the source schema?


I have a question similar to this Stackoverflow post.

How can I create a persistent table from a SQL query in a database (I use a DB2 database)? My goal is to use a table from one schema and to permanently create a more or less modified table in another schema.

What works so far is to pull the data to R and subsequently create a table in a different schema:

dplyr::tbl(con, in_schema("SCHEMA_A", "TABLE")) %>%
collect() %>% 
DBI::dbWriteTable(con, Id(schema = "SCHEMA_B", table = "NEW_TABLE"), ., overwrite = TRUE)

However, I'd like to incorporate the compute() function in a dplyr pipeline such that I do not have to pull the data into R, that is, I'd like keep the data on the database. As a side note: I do not know how I would substitute the DBI'sdbWriteTable() for dplyr's copy_to() – being able to do that would also help me.

Unfortunately, I am not able to make it work, even after reading ?compute() and its online documentation. The following code framework does not work and results in an error:

dplyr::tbl(con, in_schema("SCHEMA_A", "TABLE")) %>%
dplyr::compute(in_schema("SCHEMA_B", "NEW_TABLE"), analyze = FALSE, temporary = FALSE)

Is there a solution for using compute() or some other solution applicable to a dplyr pipeline?


Solution

  • I use a custom function that takes the SQL query behind a remote table, converts in into a query that can be executed on the SQL server to save a new table, and then executes that query using the DBI package. Key details below, full details (and other functions I find useful) in my GitHub repository here.

    write_to_database <- function(input_tbl, db_connection, db, schema, tbl_name){
      # SQL query
      sql_query <- glue::glue("SELECT *\n",
                              "INTO {db}.{schema}.{tbl_name}\n",
                              "FROM (\n",
                              dbplyr::sql_render(input_tbl),
                              "\n) AS from_table")
      
      # run query
      DBI::dbExecute(db_connection, as.character(sql_query))
    }
    

    The essence of the idea is to construct an SQL query that if you executed it in your database language directly, would give you the desired outcome. In my application this takes the form:

    SELECT *
    INTO db.schema.table
    FROM (
      /* sub query for existing table */
    ) AS alias
    

    Note that this is using SQL server, and your particular SQL syntax might be different. INTO is the SQL server pattern for writing a table. In the example linked to in the question, the syntax is TO TABLE.