Search code examples
apache-sparkbigdataambarihive-metastorespark-thriftserver

Spark Thrift 3.2.2 impersonate user facing error with metastore authen. SASL negotiation failure, GSS initiate failed


On hadoop kerberized cluster. If im not impersonate user on spark thrift server. It work well. But when i do it. Im facing an error about authentication with metastore.

I flow this document

https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/config-sts-user-imp.html

https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.4/bk_data-access/content/ref-5422cb60-d1d5-425a-b719-ec7bd03ee5d3.1.html

Step 1:

  • Set hive.server2.enable.doAs = true in Advanced spark-hive-site-override
  • Add spark.jars = /usr/hdp/current/spark-thriftserver/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-thriftserver/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/spark-thriftserver/lib/datanucleus-rdbms-3.2.9.jar in Custom spark-thrift-sparkconf

Step 2: in Advanced hiveserver2-site

  • Set hive.security.authorization.enabled = true
  • Set hive.server2.enable.doAs = true
  • Set hive.metastore.pre.event.listeners = org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener
  • Set hive.security.metastore.authorization.manager = org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider

Step 3:

  • I created a user keytab and princinpal and kinit
  • Run cli: beeline -u 'jdbc:hive2://:/default;principal=spark3/@;auth=KERBEROS;transportMode=binary'
 Result:
Connecting to jdbc:hive2://<host>:<port>/default;principal=spark3/<HOST>@<REAM>;auth=KERBEROS;transportMode=binary
Connected to: Spark SQL (version 3.2.2)
Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.4.0-315 by Apache Hive
  • Run cli: show databases;

And I'm facing an error like this

Error: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    ....
Caused by: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    ....
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
....
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    ....
Caused by: java.lang.reflect.InvocationTargetException
....
Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: GSS initiate failed
    

I checked log of spark thrift see like that

22/10/07 15:07:31 INFO ThriftCLIService: Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V10
22/10/07 15:07:31 INFO HiveSessionImpl: Operation log session directory is created: /tmp/spark3/operation_logs/64eb19a6-1bdc-4ed8-81c9-8881c4251e75
22/10/07 15:07:31 INFO metastore: Trying to connect to metastore with URI thrift://<host>:<port>
22/10/07 15:07:32 INFO metastore: Opened a connection to metastore, current connections: 1
22/10/07 15:07:32 INFO metastore: Connected to metastore.
22/10/07 15:07:39 INFO SparkExecuteStatementOperation: Submitting query 'show databases' with fdcf90cb-74bb-4574-99b7-bfd981ce8010
22/10/07 15:07:39 INFO SparkExecuteStatementOperation: Running query with fdcf90cb-74bb-4574-99b7-bfd981ce8010
22/10/07 15:07:39 INFO metastore: Closed a connection to metastore, current connections: 0
22/10/07 15:07:39 INFO metastore: Trying to connect to metastore with URI thrift://<host>:<port>
22/10/07 15:07:39 ERROR TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
    
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
    
22/10/07 15:07:39 WARN metastore: Failed to connect to the MetaStore Server...
22/10/07 15:07:39 INFO metastore: Waiting 5 seconds before next connection attempt.
22/10/07 15:07:44 INFO metastore: Trying to connect to metastore with URI  thrift://<host>:<port>
22/10/07 15:07:44 ERROR TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

I test connect to spark thrift server successed, but when i run query. Im facing error above. Where am i wrong?


Solution

  • Spark Thrift Server is built upon a single spark application, unfortunately, it does not support impersonation yet.

    Maybe you can try Apache Kyuubi https://github.com/apache/incubator-kyuubi