Search code examples
scalaapache-sparkhadoop-yarnkerberos

Kerberos issue on Spark when in cluster (YARN) mode


I am using Spark with Kerberos authentication.

I can run my code using spark-shell fine and I can also use spark-submit in local mode (e.g. —master local[16]). Both function as expected.

local mode -

spark-submit --class "graphx_sp" --master local[16] --driver-memory 20G target/scala-2.10/graphx_sp_2.10-1.0.jar

I am now progressing to run in cluster mode using YARN.

From here I can see that you need to specify the location of the keytab and specify the principal. Thus:

spark-submit --class "graphx_sp" --master yarn  --keytab /path/to/keytab --principal login_node  --deploy-mode cluster --executor-memory 13G --total-executor-cores 32 target/scala-2.10/graphx_sp_2.10-1.0.jar

However, this returns:

Exception in thread "main" java.io.IOException: Login failure for login_node from keytab /path/to/keytab: javax.security.auth.login.LoginException: Unable to obtain password from user

    at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:987)
    at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:564)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user

    at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897)
    at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760)
    at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
    at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
    at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
    at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
    at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:978)
    ... 4 more

Before I run using spark-shell or on local mode in spark-submit I do the following kerberos setup:

kinit -k -t ~/keytab -r 7d `whoami`

Clearly, this setup is not extending to the YARN setup. How do I fix the Kerberos issue with YARN in cluster mode? Is this something which must be in my /src/main/scala/graphx_sp.scala file?

Update

By running kinit -V -k -t ~/keytab -r 7dwhoami in verbose mode I was able to see the prinicpal was in the form user@node.

I updated this, checked the location of the keytab and things passed through this checkpoint succesfully:

INFO security.UserGroupInformation: Login successful for user user@login_node using keytab file /path/to/keytab

However, it then fails post this with:

client token: N/A
     diagnostics: User class threw exception: org.apache.hadoop.security.AccessControlException: Authentication required

I have checked the permissions on the keytab and the read permissions are correct. It has been suggested that the next possibility is a corrupt keytab


Solution

  • We found out that the Authentication required error happens, when the application tries to read from HDFS. Scala was doing lazy evaluation, so it didn't fail, until it started processing the file. This read from HDFS line: webhdfs://name:50070.

    Since, WEBHDFS defines a public HTTP REST API to permit access, I thought it was using acls, but enabling ui.view.acls didn't fix the issue. Adding --conf spark.yarn.access.namenodes=webhdfs://name:50070 fixed the problem. This provides comma-separated list of secure HDFS namenodes, which the Spark application is going to access. Spark acquires the security tokens for each of the namenodes so that the application can access those remote HDFS clusters. This fixed the authentication required error.

    Alternatively, direct access to HDFS hdfs://file works and authenticates using Kerberos, with principal and keytab being passed during spark-submit.