I'm using the following code however it is creating multiple connections when calling the map function and they are not closing. As a result my rds database is getting flooded with connections. Is there any way to change this code to prevent so many connections?
connect.to.database <- function (dbname, schema = "public", host, port, user, pass) {
con <- dbConnect(RPostgres::Postgres(),
dbname = dbname,
user = user,
password = pass,
host = host,
port = port)
# this puts the schema in the search path, which means that instead of
# having to use <schema name>.<table name> you can just write <table name>
res <- dbSendQuery(con, paste0("SET search_path TO ",
dbQuoteIdentifier(con, schema),
", public"))
# check for errors
dbFetch(res)
dbClearResult(res)
con
}
schemas <- dbGetQuery(connect.to.database(dbname, "public", host, port, user, password), paste0("SELECT schema_name FROM information_schema.schemata"))
schema_names <- schemas %>% pull()
schemas_tables <- map(.x = schema_names,~dbGetQuery(connect.to.database(dbname, "public", host, port, user, password), paste0("SELECT table_name FROM information_schema.tables WHERE table_schema = ","'",.x,"'")) %>% mutate(schema_name = .x)) %>%
bind_rows()
Create a single global connection object and use it within map
. (I remove the unnecessary paste0
from your first query.)
conn <- connect.to.database(dbname, "public", host, port, user, password)
schema <- dbGetQuery(conn, "SELECT schema_name FROM information_schema.schemata")
schemas_tables <- map(
.x = schema$schema_name,
~ dbGetQuery(conn, paste0("SELECT table_name FROM information_schema.tables WHERE table_schema = ","'",.x,"'")) %>%
mutate(schema_name = .x)
) %>%
bind_rows()
You may want to consider parameterized queries vice constructing query strings manually. While there are security concerns about malicious SQL injection (e.g., XKCD's Exploits of a Mom aka "Little Bobby Tables"), it is also a concern for malformed strings or Unicode-vs-ANSI mistakes, even if it's a single data analyst running the query. Both DBI
(with odbc
) and RODBC
support parameterized queries, either natively or via add-ons.
That would change this to:
schemas_tables <- map(
.x = schema$schema_name,
~ dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema = ?",
params = list(.x)) %>%
mutate(schema_name = .x)
) %>%
bind_rows()
But frankly, I think it might be much easier to use IN
instead of =
. Again, using parameter-binding.
schemas_tables <- dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema IN (?)",
params = list(schema$schema_name))
(No map
required.)
Or I believe you can do it in one query, not two.
dbGetQuery(conn, "
select table_name
from information_schema.tables
where table_schema in (
select schema_name from information_schema.schemata
)")
... to close the connection when you are done.
dbDisconnect(conn)