I have a connection to our database:
con <- dbConnect(odbc::odbc(), "myHive")
I know the con is working because I can see all of our schemas in the upper right connections pane.
If I want to select a particular table as a tbl I can go:
mytbl <- tbl(con, in_schema("mydb", "mytable"))
I expected the reult here to be a df or tbl however it's a list of 2. I was trying to extract the tbl part but am getting confused by this list.
Here is the str:
> str(mytbl)
List of 2
$ src:List of 2
..$ con :Formal class 'Hive' [package ".GlobalEnv"] with 4 slots
.. .. ..@ ptr :<externalptr>
.. .. ..@ quote : chr "`"
.. .. ..@ info :List of 13
.. .. .. ..$ dbname : chr "HIVE"
.. .. .. ..$ dbms.name : chr "Hive"
.. .. .. ..$ db.version : chr "1.2.2"
.. .. .. ..$ username : chr ""
.. .. .. ..$ host : chr ""
.. .. .. ..$ port : chr ""
.. .. .. ..$ sourcename : chr "gdHive"
.. .. .. ..$ servername : chr "Hive"
.. .. .. ..$ drivername : chr "Hortonworks Hive ODBC Driver"
.. .. .. ..$ odbc.version : chr "03.52"
.. .. .. ..$ driver.version : chr "2.6.1.1001"
.. .. .. ..$ odbcdriver.version : chr "03.80"
.. .. .. ..$ supports.transactions: logi FALSE
.. .. .. ..- attr(*, "class")= chr [1:3] "Hive" "driver_info" "list"
.. .. ..@ encoding: chr ""
..$ disco: NULL
..- attr(*, "class")= chr [1:3] "src_dbi" "src_sql" "src"
$ ops:List of 2
..$ x : 'ident_q' chr "mydb.mytable"
..$ vars: chr [1:188] "zzz1.order_id" "zzz1.row_id" "zzz1.order_ts" "zzz1.order_date" ...
..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
- attr(*, "class")= chr [1:4] "tbl_dbi" "tbl_sql" "tbl_lazy" "tbl"
What am I looking at here? How can I get the table from mydb.mytable as a tbl?
If you are looking to load the table into R memory (rather than work with it remotely) use mytbl %>% collect()
.
Otherwise, I think what you are looking at is a standard remote table. This is not particular to in_schema
. If you try class(mytbl)
you should expect to see "tbl"
as one of its classes.
By default R does not load data from a remote table into memory. To view the first few rows try: mytbl %>% head()
.
You can manipulate the table using all the standard dplyr commands. For example:
results = mytbl %>%
rename(new_name = old_name) %>%
mutate(new_col = 2*old_col) %>%
group_by(new_col) %>%
summarise(number = n()) %>%
filter(number > 1000)
And then when you want the result in R: results = results %>% collect()
.