Search code examples
pythonhadoophivekerberospyhive

Using pyhive with kerberos ticket to connect to kerberized hadoop cluster


I would like to connect to Hive on our kerberized Hadoop cluster and then run some hql queries (obviously haha :)) from machine, which already has its own Kerberose Client and it works, keytab has been passed and tested.

Our Hadoop runs HWS 3.1 and CentOS7, my machine als runs CentOS7 I'm using Python 3.7.3 and PyHive (0.6.1).

I have installed bunch of libraries (and I also tried to uninstall them), as I was going through different forums (HWS, Cloudera, here SO...)

I installed through pip sasl libraries

  • pure-sasl (0.6.1)
  • pysasl (0.4.1)
  • sasl (0.2.1)
  • thrift-sasl (0.3.0)

I installed through yum

  • cyrus-sasl-2.1.26-23.el7.x86_64
  • cyrus-sasl-lib-2.1.26-23.el7.x86_64
  • cyrus-sasl-plain-2.1.26-23.el7.x86_64
  • saslwrapper-devel-0.16-5.el7.x86_64
  • saslwrapper-0.16-5.el7.x86_64
  • cyrus-sasl-lib-2.1.26-23.el7.i686
  • cyrus-sasl-devel-2.1.26-23.el7.x86_64

Below lies my connection to the hive

return hive.Connection(host=self.host, port=self.port,
       database=self.database, auth=self.__auth,
       kerberos_service_name=self.__kerberos_service_name)

This is part of my yaml

hive_interni_hdp: 
    db_type: hive 
    host: domain.xx.lan 
    database: database_name 
    user: user_name 
    port: 10000 
    auth: KERBEROS 
    kerberos_service_name: hive

When I try to run the code, I'm getting following error.

  File "/opt/Python3.7.3/lib/python3.7/site-packages/dfpy/location.py", line 1647, in conn
    self.__conn = self._create_connection()
  File "/opt/Python3.7.3/lib/python3.7/site-packages/dfpy/location.py", line 1633, in _create_connection
    kerberos_service_name=self.__kerberos_service_name)
  File "/opt/Python3.7.3/lib/python3.7/site-packages/pyhive/hive.py", line 192, in __init__
    self._transport.open()
  File "/opt/Python3.7.3/lib/python3.7/site-packages/thrift_sasl/__init__.py", line 79, in open
    message=("Could not start SASL: %s" % self.sasl.getError()))
thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found'

Did anyone had luck? Where is the obstacle? Pyhive libs, wrong Kerberos connection settings?


Solution

  • I found an solution, I checked out this documentation https://www.cyrusimap.org/sasl/sasl/sysadmin.html

    where is GSSAPI mentioned (with Kerberos 5, which I'm using) and I have checked, that I have no support for gssapi on my machine using

    sasl2-shared-mechlist
    

    It stated

    GSS-SPNEGO,LOGIN,PLAIN,ANONYMOUS

    but after installing gssapi library

    yum install cyrus-sasl-gssapi
    

    mechlist states

    GSS-SPNEGO,GSSAPI,LOGIN,PLAIN,ANONYMOUS

    Than I run the code again and Hooray!

    P.S. Don't forget to autentificate and verify your keytab is valid

    kinit -kt /root/user.keytab [email protected]
    klist