I have the hadoop cluster with installed hive and spark. In addition I have a separate workstation machine and I am trying to connect to the cluster from it
I installed spark on this machine and try to connect using following command:
pyspark --name testjob --master spark://hadoop-master.domain:7077
In the results I see sunning application on the spark WebUI page.
I want to connect to hive database (in the cluster) from my workstation, but I can't do this. I have the hive-site.xml config into my spark conf directory on local workstation with following contents:
<configuration>
<property>
<name>metastore.thrift.uris</name>
<value>thrift://hadoop-master.domain:9083</value>
<description>IP address (or domain name) and port of the metastore host</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://hadoop-master.domain:9000/user/hive/warehouse</value>
<description>Warehouse location</description>
</property>
<property>
<name>metastore.warehouse.dir</name>
<value>hdfs://hadoop-master.domain:9000/user/hive/warehouse</value>
<description>Warehouse location</description>
</property>
<property>
<name>spark.sql.hive.metastore.version</name>
<value>3.1.0</value>
<description>Metastore version</description>
</property>
</configuration>
I tied this construction, but can't make it work with external hive databases:
spark = SparkSession \
.builder \
.appName('test01') \
.config('hive.metastore.uris', "thrift://hadoop-master.domain:9083") \
.config("spark.sql.warehouse.dir", "hdfs://hadoop-master.domain:9000/user/hive/warehouse") \
.enableHiveSupport() \
.getOrCreate()
What I shoul do to connect from local pyspark to remote hive database?
Replace:
.config('hive.metastore.uris', "thrift://hadoop-master.domain:9083")
With:
.config('spark.hadoop.hive.metastore.uris', "thrift://hadoop-master.domain:9083")