I have a postgresql database connection and want to get a table from the database. Presumably it's good practice to keep the connection info in a different file? I have two files just now:
#getthetable.R
library(tidyverse)
library(dbplyr)
## connect to db
con <- src_postgres(dbname = "thedbname",
host = "blablabla.amazonaws.com",
port = NULL,
user = "myname",
password = "1234")
thetable <- tbl(con, "thetable") %>% select(id, apples, carrots) %>% collect
And then:
#main.R
library(tidyverse)
## get data from getthetable script with connection
source("rscripts/getthetable.R")
This now makes both con
and thetable
variables available in main.R. I just want the variable thetable
from getthetable.R. How do I do that? Leaving out con variable?
Also, is there a best practice here when working with db connections in r? Is my thinking logical? Are there drawbacks to what I'm doing or do most people just put the connection in together with the main scripts?
I also like to capture such things (like connections) in a different file, but also in an designated environment like this:
ConnectionManager <- local({
con <- src_postgres(dbname = "thedbname",
host = "blablabla.amazonaws.com",
port = NULL,
user = "myname",
password = "1234")
collectTable <- function() {
tbl(con, "thetable") %>% select(id, apples, carrots) %>% collect
}
list(collectTable = collectTable)
})
This way you have only one object ConnectionManager
after sourcing the file and can get the table with ConnectionManager$collectTable()
. Additionally you can easily extend it to fetch other tables or to include some connection utility functions.