Search code examples
apache-sparkhivekerberosooziehive-metastore

Oozie Spark action failed for kerberos environment


I am running a spark job through oozie spark action. The spark job uses hivecontext to perform some requirement. The cluster is configured with kerberos. When I submit the job using spark-submit form console, it is running successfully. But when I run the job from oozie, ending up with the following error.

18/03/18 03:34:16 INFO metastore: Trying to connect to metastore with URI thrift://localhost.local:9083
    18/03/18 03:34:16 ERROR TSaslTransport: SASL negotiation failure
    javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
            at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
            at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)

workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.5" name="workflow">
   <start to="analysis" />
   <!-- Bash script to do the spark-submit. The version numbers of these actions are magic. -->
   <action name="Analysis">
      <spark xmlns="uri:oozie:spark-action:0.1">
         <job-tracker>${jobTracker}</job-tracker>
         <name-node>${nameNode}</name-node>
         <master>${master}</master>
         <name>Analysis</name>
         <class>com.demo.analyzer</class>
         <jar>${appLib}</jar>
         <spark-opts>--jars ${sparkLib} --files ${config},${hivesite} --num-executors ${NoOfExecutors} --executor-cores ${ExecutorCores} --executor-memory ${ExecutorMemory} --driver-memory ${driverMemory}</spark-opts>
      </spark>
      <ok to="sendEmail" />
      <error to="fail" />
   </action>
   <action name="sendEmail">
      <email xmlns="uri:oozie:email-action:0.1">
         <to>${emailToAddress}</to>
         <subject>Output of workflow ${wf:id()}</subject>
         <body>Results from line count: ${wf:actionData('shellAction')['NumberOfLines']}</body>
      </email>
      <ok to="end" />
      <error to="end" />
   </action>
   <!-- You wish you'd ever get Oozie errors. -->
   <kill name="fail">
      <message>Bash action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
   </kill>
   <end name="end" />
</workflow-app>

Do I need to configure anything related to Kerberos in workflow.xml ?. Am I missing anything here.

Any help appreciated.

Thanks in advance.


Solution

  • You need to add, hcat credentials for thrift uri in oozie workflow. This will enable successful authentication of metastore forthe thrift URI using Kerberos.

    Add, below credentials tag in oozie workflow.

    <credentials>
        <credential name="credhive" type="hcat">
            <property>
                <name>hcat.metastore.uri</name>
                <value>${thrift_uri}</value>
            </property>
            <property>
                <name>hcat.metastore.principal</name>
                <value>${principal}</value>
            </property>
        </credential>
    </credentials>
    

    And provide the credentials to the spark action as below:

    <action name="Analysis" cred="credhive">
          <spark xmlns="uri:oozie:spark-action:0.1">
             <job-tracker>${jobTracker}</job-tracker>
             <name-node>${nameNode}</name-node>
             <master>${master}</master>
             <name>Analysis</name>
             <class>com.demo.analyzer</class>
             <jar>${appLib}</jar>
             <spark-opts>--jars ${sparkLib} --files ${config},${hivesite} --num-executors ${NoOfExecutors} --executor-cores ${ExecutorCores} --executor-memory ${ExecutorMemory} --driver-memory ${driverMemory}</spark-opts>
          </spark>
          <ok to="sendEmail" />
          <error to="fail" />
       </action>
    

    The thrift_uri and principalcan be found in hive-site.xml. thrift_uri will be set in the hive-site.xml property:

    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://xxxxxx:9083</value>
      </property>
    

    Also, principal will be set in hive-site.xml property:

     <property>
        <name>hive.metastore.kerberos.principal</name>
        <value>hive/[email protected]</value>
      </property>