I installed Apache Hive in local and I was trying to read tables via Rstudio/sparklyr.
I created a database using Hive:
hive> CREATE DATABASE test;
and I was trying to read that database using the following R script:
library(sparklyr)
library(dplyr)
library(DBI)
spark_disconnect_all()
Sys.setenv(SPARK_HOME = "/home/alessandro/spark-2.1.0-bin-hadoop2.7")
config <- spark_config()
config$spark.executor.instances <- 4
config$spark.executor.cores <- 4
config$spark.executor.memory <- "4G"
config$spark.sql.hive.metastore <- "/home/alessandro/spark-warehouse"
config$hive.metastore.warehouse.dir <- "/home/alessandro/spark-warehouse"
sc <- spark_connect(master="local", config=config, version="2.1.0")
dbGetQuery(sc, "show databases")
Still, dbGetQuery(sc, "show databases")
does not show the database created, and that's awkward since the database folder test.db is correctly placed in the specified hive.metastore.warehouse.dir
.
Similarly, if I create a database using dbGetQuery(sc, "CREATE DATABASE test2"), a database folder is created in hive.metastore.warehouse.dir
, but I can't see it via Hive using:
hive> show databases;
Basically, even if all the database folders are placed in the correct path, from Hive I can see only databases created via Hive, whereas from R I can see only databases created via R.
I solved the issue adding in the file hive-site.xml the configuration to connect with hive:
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
</property>