Search code examples
pythonhiveconfigurationkerberospyhive

Connection to Hive using python and Kerberos


I'm trying to connect to hive using Python. I installed all of the dependencies required (sasl, thrift_sasl, etc..)

Here is how I try to connect:

configuration = {"hive.server2.authentication.kerberos.principal" : "hive/_HOST@REALM_HOST", "hive.server2.authentication.kerberos.keytab" : "/etc/security/keytabs/hive.service.keytab"}

connection = hive.Connection(configuration = configuration, host="host", port=port, auth="KERBEROS", kerberos_service_name = "hiveserver2")

But I get this error:

Minor code may provide more information (Cannot find KDC for realm "REALM_DOMAIN")

Whay I'm missing? Does someone has an example of an pyHive connection using kerberos?

Thank you for your help.


Solution

  • Thank you @Kishore. Actually in PySpark, the code looks like this :

    import pyspark
    from pyspark import SparkContext
    from pyspark.sql import Row
    from pyspark import SparkConf
    from pyspark.sql import HiveContext
    from pyspark.sql import functions as F
    import pyspark.sql.types as T
    
    def connection(self):
        conf = pyspark.SparkConf()
        conf.setMaster('yarn-client')
        sc = pyspark.SparkContext(conf=conf)
    
        self.cursor = HiveContext(sc)
    
        self.cursor.setConf("hive.exec.dynamic.partition", "true")
        self.cursor.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
        self.cursor.setConf("hive.warehouse.subdir.inherit.perms", "true")
        self.cursor.setConf('spark.scheduler.mode', 'FAIR')
    

    and you can request using :

    rows = self.cursor.sql("SELECT someone FROM something")
    for row in rows.collect():
        print row
    

    I'm actually running the code via the command :

    spark-submit --master yarn MyProgram.py
    

    I guess you could using basically run the python with pyspark installed like :

    python MyProgram.py 
    

    but I didn't tried so I won't assure that it's working