Search code examples
rapache-sparkdatabrickssparkrsparklyr

Use multiple spark connections in a databricks notebook


I want to connect to two database tables in a single databricks (R) notebook using spark and sparklyr:

sc_one <- spark_connect(method="databricks")
tbl_change_db(sc_one, "first_database_name")
first_var <- spark_read_table(sc,"first_table_name") 

sc_two <- spark_connect(method="databricks")
tbl_change_db(sc_two, "second_database_name")
second_var <- spark_read_table(sc,"second_table_name") 

The above doesn't work because one session seems to be overwriting the other and when I try to call first_var, I receive an error that first_table_name cannot be found. How can I use two spark connections in a single notebook in databricks?


Solution

  • I believe eventually you want to read two separate tables, residing in different databases, as two separate spark dataframes. You may just specify the table name along with the database name in following way

    sc <- spark_connect(method="databricks")
    
    first_var <- spark_read_table(sc, "first_table_name",
     options=list(dbtable="first_database_name.first_table_name"))
    
    second_var <- spark_read_table(sc, "second_table_name",
     options=list(dbtable="second_database_name.second_table_name"))
    

    You may check which databases are available to the cluster using src_databases(sc).