Search code examples
apache-sparkjdbchadoop-yarnimpala

Impala JDBC connection issue in spark cluster mode


Impala jdbc connection throwing below exception while running spark job in cluster mode. Spark job creates hive table and does impala table invalidate/refresh using JDBC. The same job executes successfully in spark client mode.

java.sql.SQLException: [Simba][ImpalaJDBCDriver](500164) Error initialized or created transport for authentication: [Simba][ImpalaJDBCDriver](500169) Unable to connect to server: GSS initiate failed. at om.cloudera.hivecommon.api.HiveServer2ClientFactory.createTransport(Unknown Source)
    at com.cloudera.hivecommon.api.HiveServer2ClientFactory.createClient(Unknown Source)
    at com.cloudera.hivecommon.core.HiveJDBCCommonConnection.connect(Unknown Source)
    at com.cloudera.impala.core.ImpalaJDBCConnection.connect(Unknown Source)
    at com.cloudera.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source)
    at com.cloudera.jdbc.common.AbstractDriver.connect(Unknown Source)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:270)

Solution

  •   protected def getImpalaConnection(impalaJdbcDriver: String, impalaJdbcUrl: String): Connection = {
    if (impalaJdbcDriver.length() == 0) return null
    try {
      Class.forName(impalaJdbcDriver).newInstance
      UserGroupInformation.getLoginUser.doAs(
        new PrivilegedAction[Connection] {
          override def run(): Connection = DriverManager.getConnection(impalaJdbcUrl)
        }
      )
    } catch {
      case e: Exception => {
        println(e.toString() + " --> " + e.getStackTraceString)
        throw e
      }
    } }
    
    val   impalaJdbcDriver = "com.cloudera.impala.jdbc41.Driver"
    
    val impalaJdbcUrl = "jdbc:impala://<Impala_Host>:21050/default;AuthMech=1;SSL=1;KrbRealm=HOST.COM;KrbHostFQDN=_HOST;KrbServiceName=impala;REQUEST_POOL=xyz"
    
    println("Start impala connection")
    
    val impalaConnection = getImpalaConnection(impalaJdbcDriver,impalaJdbcUrl)
    
    
    val result = impalaConnection.createStatement.executeQuery(s"SELECT COUNT(1) FROM testTable")
    println("End impala connection")
    

    Build thick jar and use below given spark submit command. You can pass additional parameters like file,jars if needed.

    Spark submit command:

    spark-submit --master yarn-cluster --keytab /home/testuser/testuser.keytab --principal testuser@host.COM  --queue xyz--class com.dim.UpdateImpala
    

    Make changes like below as per your spark version

    For Spark1: UserGroupInformation.getLoginUser

    For Spark2 : UserGroupInformation.getCurrentUser