I am using tidyverse to connect to multiple databases with the same data structure (clusters). Due to different database sources a union ist not possible without copy locally.
I can do everything with long coding, but now I try to shorten the code an run in the following problem. When defining the column names for the select statement dbplyr stores that with a looping variable into the connection rather than evaluating and store the result of the string.
Here is a minimal reproducible example:
library(tidyverse)
#reproducable example with two database and two tables in memory
con1 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
con2 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con1, mtcars)
copy_to(con1, iris)
copy_to(con2, mtcars)
copy_to(con2, iris)
#names of the tables
tables<-c("mtcars", "iris")
#specify which columns to select from which table
columns<-list("mtcars"=c("mpg", "hp"),
"iris"=c("Sepal.Length", "Sepal.Width"))
#list to put
data_list<-vector(mode="list", length=length(tables))
names(data_list)<-tables
#loop over tables
for(i in tables){
#loop over databases
for(j in 1:2)
data_list[[i]][[j]]<-tbl(get(paste0("con",j)), i)%>%select(columns[[i]])
}
This code works fine so far, but the problem is with accessing the data stored in the list (data_list).
If I try
data_list[[1]][[1]]
R still evaluate
select(columns[[i]])
After looping i ist "iris" and the statement give the error message:
Error: Unknown columns Sepal.Length
and Sepal.Width
For the second list in data_list it works fine because i is set appropriate:
data_list[[2]][[1]]
How can I force the select statement to evaluate the expression and not to store the expression with the looping variable I?
In the next step, I would like to add a filter expression too so that I don't have to collect all the data and only the data needed.
If the union would work over databases without copy that would solve all the problems
Thx and best regards Thomas
Hmm, you mean that you want to select columns interactively after you've queried the data base?
I edited your code to use use the tidyverse
functions (since you've already loaded).
library(tidyverse)
# Reproducable example with two database and two tables in memory
con1 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
con2 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con1, mtcars)
copy_to(con1, iris)
copy_to(con2, mtcars)
copy_to(con2, iris)
# Specify which columns to select from which table
columns <-list("mtcars" = c("mpg", "hp"), "iris" = c("Sepal.Length", "Sepal.Width"))
# Loop over the table names (mtcars, iris) **and** the columns that belong to those datasets
data_list <-
map2(names(columns), columns, ~ {
# For each table/column combination, grab them from con1 and con2 and return them in a list
con1_db <- tbl(con1, .x) %>% select(.y)
con2_db <- tbl(con2, .x) %>% select(.y)
list(con1_db, con2_db)
}) %>%
setNames(names(columns))
# With this you can interactively select the columns you wanted for each data. Just replace the dataset that you're interested in.
data_list %>%
pluck("iris") %>%
map(select, columns[['iris']])
#> [[1]]
#> Warning: `overscope_eval_next()` is deprecated as of rlang 0.2.0.
#> Please use `eval_tidy()` with a data mask instead.
#> This warning is displayed once per session.
#> Warning: `overscope_clean()` is deprecated as of rlang 0.2.0.
#> This warning is displayed once per session.
#> # Source: lazy query [?? x 2]
#> # Database: sqlite 3.30.1 [:memory:]
#> Sepal.Length Sepal.Width
#> <dbl> <dbl>
#> 1 5.1 3.5
#> 2 4.9 3
#> 3 4.7 3.2
#> 4 4.6 3.1
#> 5 5 3.6
#> 6 5.4 3.9
#> 7 4.6 3.4
#> 8 5 3.4
#> 9 4.4 2.9
#> 10 4.9 3.1
#> # … with more rows
#>
#> [[2]]
#> # Source: lazy query [?? x 2]
#> # Database: sqlite 3.30.1 [:memory:]
#> Sepal.Length Sepal.Width
#> <dbl> <dbl>
#> 1 5.1 3.5
#> 2 4.9 3
#> 3 4.7 3.2
#> 4 4.6 3.1
#> 5 5 3.6
#> 6 5.4 3.9
#> 7 4.6 3.4
#> 8 5 3.4
#> 9 4.4 2.9
#> 10 4.9 3.1
#> # … with more rows