I wrote a program by using spark streaming to insert data to kerberos enabled hbase. In one batch, I met one failed task. The error is below:
java.io.IOException: Login failure for [email protected] from keytab ./user.keytab
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1160)
at com.framework.common.HbaseUtil$.InsertToHbase(HbaseUtil.scala:81)
at com.framework.realtime.RDDUtil$$anonfun$dwsTodwd$2.apply(RDDUtil.scala:203)
at com.framework.realtime.RDDUtil$$anonfun$dwsTodwd$2.apply(RDDUtil.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.security.auth.login.LoginException: Receive timed out
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:767)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:584)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:762)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:690)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:687)
at javax.security.auth.login.LoginContext.login(LoginContext.java:595)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1149)
... 13 more
Caused by: java.net.SocketTimeoutException: Receive timed out
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
at java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:146)
at java.net.DatagramSocket.receive(DatagramSocket.java:816)
at sun.security.krb5.internal.UDPClient.receive(NetClient.java:207)
at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:390)
at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:343)
at java.security.AccessController.doPrivileged(Native Method)
at sun.security.krb5.KdcComm.send(KdcComm.java:327)
at sun.security.krb5.KdcComm.send(KdcComm.java:219)
at sun.security.krb5.KdcComm.send(KdcComm.java:191)
at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:319)
at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:364)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:735)
... 25 more
But in the second attempt,the task succeed. In my opinion,the certification process is too long, so it fails, and in another attempt, the process is short. So it scceed. Am I correct? If so or not, how to solve this problem please? My code is as below:
val ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(princ,
keytab)
ugi.doAs(new PrivilegedAction[Unit]() {
def run(): Unit = {
// TODO Auto-generated method stub
var conn: HConnection = null
var htable: HTableInterface = null
conn = HConnectionManager.createConnection(conf)
htable = conn.getTable(tableName)
htable.setAutoFlushTo(false)
for (record <- partitionOfRecords) {
htable.put(record)
}
}
})
From Hadoop and Kerberos - the Madness beyond the Gate chapter "Error Messages to Fear"...
Receive timed out
Usually in a stack trace like
Caused by: java.net.SocketTimeoutException: Receive timed out
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
...
at sun.security.krb5.internal.UDPClient.receive(NetClient.java:207)
... UDP socket ... Switch to TCP —at the very least, it will fail faster.
And just above that:
Switching kerberos to use TCP rather than UDP
In/etc/krb5.conf
:
[libdefaults]
udp_preference_limit = 1
Generally speaking, many erratic Kerberos issues seem to occur only with UDP, so it's unfortunate that it's used by default...
kdc_timeout
configuration parameter, but it's a dirty mess:
krb5.conf
(or implicitly listed via a DNS alias set with a round-robin rule, for example) then in case of "KDC timeout" Java should retry with the next KDC in line. Unless you have reached a global time-out.