Search code examples
rhadoophivesparklyr

sparklyr can't see databases created in Hive and vice versa


I installed Apache Hive in local and I was trying to read tables via Rstudio/sparklyr.

I created a database using Hive:

hive> CREATE DATABASE test;

and I was trying to read that database using the following R script:

library(sparklyr)
library(dplyr)
library(DBI)

spark_disconnect_all()

Sys.setenv(SPARK_HOME = "/home/alessandro/spark-2.1.0-bin-hadoop2.7")

config <- spark_config()
config$spark.executor.instances <- 4
config$spark.executor.cores <- 4
config$spark.executor.memory <- "4G"
config$spark.sql.hive.metastore <- "/home/alessandro/spark-warehouse"
config$hive.metastore.warehouse.dir <- "/home/alessandro/spark-warehouse"
sc <- spark_connect(master="local", config=config, version="2.1.0")

dbGetQuery(sc, "show databases")

Still, dbGetQuery(sc, "show databases") does not show the database created, and that's awkward since the database folder test.db is correctly placed in the specified hive.metastore.warehouse.dir.

Similarly, if I create a database using dbGetQuery(sc, "CREATE DATABASE test2"), a database folder is created in hive.metastore.warehouse.dir, but I can't see it via Hive using:

hive> show databases;

Basically, even if all the database folders are placed in the correct path, from Hive I can see only databases created via Hive, whereas from R I can see only databases created via R.


Solution

  • I solved the issue adding in the file hive-site.xml the configuration to connect with hive:

       <property>
        <name>hive.metastore.uris</name>
        <value>thrift://localhost:9083</value>
      </property>