Search code examples
pythonapache-sparkpysparkhiveapache-spark-sql

How to setup connection to HIVE using PySpark and SparkSession (How do I add username and password)?


I have been trying to access tables in Hive using PySpark and after reading a few other posts, this is the way people recommend connecting to Hive. But it doesn't work. Then I realize I must probably pass my username and password, but I can't understand how to do it. So is there a way to pass the username and pw when setting up SparkSession, or what else could be the problem?

import sys
from pyspark import SparkContext, SparkConf, HiveContext
from pyspark.sql import SparkSession

if __name__ == "__main__":

# create Spark context with Spark configuration
spark = SparkSession.builder()
      .appName("interfacing spark sql to hive metastore without configuration file")
      .config("hive.metastore.uris", "thrift://my_server:10000")
      .enableHiveSupport()
      .getOrCreate()
sc = spark.sparkContext
df = sc.parallelize([(1, 2, 3, 'a b c'),(4, 5, 6, 'd e f'),(7, 8, 9, 'g h i')]).toDF(['col1', 'col2', 'col3','col4'])
df.write.mode("overwrite").saveAsTable("test_spark")

Traceback

Exception in thread "main" org.apache.spark.SparkException: Application application_1575789516697_258641 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1168)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:780)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Solution

  • Spark connects to Hive directly. No need to pass user name and password, just pass the hive-site.xml while submit the spark application.

    Use this bellow code,

     from pyspark.sql import SparkSession
    
       sparkSession = SparkSession.builder.appName("ApplicationName").enableHiveSupport().getOrCreate()
       
    
    

    While submitting your application pass the hive-site.xml file, AS,

    spark-submit --files /<location>/hive-site.xml --py-files <List_of_Pyfiles>