Search code examples
apache-sparkhadooppysparkhive

Connect to hive metastore from remote spark


I have the hadoop cluster with installed hive and spark. In addition I have a separate workstation machine and I am trying to connect to the cluster from it

I installed spark on this machine and try to connect using following command:

pyspark --name testjob --master spark://hadoop-master.domain:7077

In the results I see sunning application on the spark WebUI page.

I want to connect to hive database (in the cluster) from my workstation, but I can't do this. I have the hive-site.xml config into my spark conf directory on local workstation with following contents:

<configuration>
  <property>
    <name>metastore.thrift.uris</name>
    <value>thrift://hadoop-master.domain:9083</value>
    <description>IP address (or domain name) and port of the metastore host</description>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://hadoop-master.domain:9000/user/hive/warehouse</value>
    <description>Warehouse location</description>
  </property>
  <property>
    <name>metastore.warehouse.dir</name>
    <value>hdfs://hadoop-master.domain:9000/user/hive/warehouse</value>
    <description>Warehouse location</description>
  </property>
  <property>
    <name>spark.sql.hive.metastore.version</name>
    <value>3.1.0</value>
    <description>Metastore version</description>
  </property>
</configuration>

I tied this construction, but can't make it work with external hive databases:

spark = SparkSession \
 .builder \
 .appName('test01') \
 .config('hive.metastore.uris', "thrift://hadoop-master.domain:9083") \
 .config("spark.sql.warehouse.dir", "hdfs://hadoop-master.domain:9000/user/hive/warehouse") \
 .enableHiveSupport() \
 .getOrCreate()

What I shoul do to connect from local pyspark to remote hive database?


Solution

  • Replace:

    .config('hive.metastore.uris', "thrift://hadoop-master.domain:9083")
    

    With:

    .config('spark.hadoop.hive.metastore.uris', "thrift://hadoop-master.domain:9083")