Search code examples
rapache-sparkamazon-emrsparkrsparklyr

Sparklyr read database table to distributed DF


Hi I am trying to figure out if there is a way to directly read a DB table to a sparkR dataframe. I have rstudio installed on an EMR cluster which has my hive metastore on it.

I know I can do the following:

library(sparklyr)
library(dplyr)
sc <- spark_connect(master = "local")
library(DBI)
query <- "select * from schema.table"
result <- dbGetQuery(sc, query) 
result_t <- copy_to(sc,result)

but is there a way to query directly into result_t?


Solution

  • Like @kevinykuo suggested,

    result_t <- tbl(sc, "schema.table")