I have a Netezza SQL server I connect to using DBI::dbConnect. The server has multiple databases we will name db1 and db2.
I would like to use dbplyr as much as possible and skip having to write SQL code in RODBC::sqlQuery(), but I am not sure how to do the following:.
1) How to read a table in db1, work on it and have the server write the result into a table in db2 without going through my desktop?
2) How to do a left join between a table in db1 and another in db2 ?
It looks like there might be a way to connect to database ="SYSTEM" instead of database = "db1" or "db2", but I am not sure what a next step would be.
con <- dbConnect(odbc::odbc(),
driver = "NetezzaSQL",
database = "SYSTEM",
uid = Sys.getenv("netezza_username"),
pwd = Sys.getenv("netezza_password"),
server = "NETEZZA_SERVER",
port = 5480)
I work around this problem on SQL server using in_schema
and dbExecute
as follows. Assuming Netezza is not too different.
The first problem is to connect to both tables via the same connection. If we use a different connection then joining the two tables results in data being copied from one connection to the other which is very slow.
con <- dbConnect(...) # as required by your database
table_1 <- dplyr::tbl(con, from = dbplyr::in_schema("db1", "table_name_1"))
table_2 <- dplyr::tbl(con, from = dbplyr::in_schema("db2.schema2", "table_name_2"))
While in_schema
is intended for passing schema names you can also use it for passing the database name (or both with a dot in between).
The following should now work without issue:
# check connection
head(table_1)
head(table_2)
# test join code
left_join(table_1, table_2, by = "id") %>% show_query()
# check left join
left_join(table_1, table_2, by = "id") %>% head()
A remote table is defined by two things
show_query
)We can use these with dbExecute
to write to the database. My example will be with SQL server (which uses INTO
as the keyword, you'll have to adapt to your own environment if the sql syntax is different).
# optional, extract connection from table-to-save
con <- table_to_save$src$con
# SQL query
sql_query <- paste0("SELECT *\n",
"INTO db1.new_table \n", # the database and name you want to save
"FROM (\n",
dbplyr::sql_render(table_to_save),
"\n) subquery_alias")
# run query
dbExecute(con, as.character(sql_query))
The idea is to create a query that can be executed by the database that will write the new table. I have done this by treating the existing query as a subquery of the SELECT ... INTO ... FROM (...) subquery_alias
pattern.
show_query
or sql_render
would work when you access the database directly then the above should work (all that changes is the command is arriving via R instead of via the sql console).dbExecute
will error if the table already exists in the database, so I recommend checking for this first.in_schema
has not worked for creating views. To create (or delete) a view I have to ensure the connection is to the database where I want the view.