I have been trying to access tables in Hive using PySpark
and after reading a few other posts, this is the way people recommend connecting to Hive. But it doesn't work. Then I realize I must probably pass my username and password, but I can't understand how to do it. So is there a way to pass the username and pw when setting up SparkSession
, or what else could be the problem?
import sys
from pyspark import SparkContext, SparkConf, HiveContext
from pyspark.sql import SparkSession
if __name__ == "__main__":
# create Spark context with Spark configuration
spark = SparkSession.builder()
.appName("interfacing spark sql to hive metastore without configuration file")
.config("hive.metastore.uris", "thrift://my_server:10000")
.enableHiveSupport()
.getOrCreate()
sc = spark.sparkContext
df = sc.parallelize([(1, 2, 3, 'a b c'),(4, 5, 6, 'd e f'),(7, 8, 9, 'g h i')]).toDF(['col1', 'col2', 'col3','col4'])
df.write.mode("overwrite").saveAsTable("test_spark")
Traceback
Exception in thread "main" org.apache.spark.SparkException: Application application_1575789516697_258641 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1168)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:780)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Spark connects to Hive directly. No need to pass user name and password, just pass the hive-site.xml
while submit the spark application.
Use this bellow code,
from pyspark.sql import SparkSession
sparkSession = SparkSession.builder.appName("ApplicationName").enableHiveSupport().getOrCreate()
While submitting your application pass the hive-site.xml file, AS,
spark-submit --files /<location>/hive-site.xml --py-files <List_of_Pyfiles>