I am very new to using servers for data analysis, and I am completely baffled by what a "tbl_df""tbl""data.frame" object, and "tbl_Redshift" "tbl_dbi" "tbl_sql" "tbl_lazy" "tbl" is in R? I know that something related to the Redshift server and tbl-lazy tables but that's it.
I have been following on a previous analyst's work, and I have used collect() to pull a bunch of data onto the local machine, doing some transformation before trying to upload it back to the server. However, I cannot upload the data using compute() for some reason, and I believe the reason is because the object itself became "tbl_df""tbl""data.frame" from "tbl_Redshift" "tbl_dbi" "tbl_sql" "tbl_lazy" "tbl".
Is there a way to transform this back so I can upload it? Thanks in advance.
Let's begin by ensuring the core concepts are clear:
tbl_df
, tbl
, and data.frame
are all local objects where the data exist in local R memory.tbl_Redshift
, tbl_dbi
, and tbl_sql
are all remote objects where the data exists in the database and R has a pointer/query/connection to the database in local R memory.The collect
command retrieves data from a remote database into local R memory. Hence it converts an object of type tbl_Redshift
, tbl_dbi
, or tbl_sql
into an object of type tbl_df
, tbl
, 0r data.frame
.
The compute
command works for remote objects (tbl_Redshift
, tbl_dbi
, or tbl_sql
) and saves an temporary/intermediate stage of a calculation to the database, creating a new remote object. This helps reduce additional computation. It can not be used to copy data from local R memory.
Put another way:
collect
: remote data input -> local data outputcompute
: remote data input -> remote date outputIt sounds like what you need is : local data input -> remote data output.
For this, there are two broad options:
collect
. If the data is never copied from the server into local R memory then you never have to transfer it back.DBI::dbWriteTable
command for this purpose. Example function below.copy_from_r_to_sql <- function(db_conn, db, schema, sql_table_name, r_table_name) {
DBI::dbWriteTable(
db_conn,
DBI::Id(
catalog = db,
schema = schema,
table = sql_table_name
),
r_table_name
)
}
(This is a cut down version of one of my dbplyr helpers here)