Search code examples
javahadoop-yarnapache-nifispark-launcher

SparkLauncher Run spark-submit with yarn-client with user as hive


Trying to run spark job with masterURL=yarn-client. Using SparkLauncher 2.10. The java code is wrapped in nifi processor. Nifi is currently running as root. When I do yarn application -list, I see the spark job started with USER = root. I want to run it with USER = hive. Following is my SparkLauncher code.

Process spark = new SparkLauncher()
    .setSparkHome(cp.fetchProperty(GlobalConstant.spark_submit_work_dir).toString())
    .setAppResource(cp.fetchProperty(GlobalConstant.spark_app_resource))
    .setMainClass(cp.fetchProperty(GlobalConstant.spark_main_class))
    .addAppArgs(ps.getName())
    //   .setConf(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS,"-Duser.name=hive")
    .setConf(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, "-Dlog4j.configuration=file:///opt/eim/log4j_submitgnrfromhdfs.properties")
    .setVerbose(true)
    .launch();

Do I need to pass user as driver extra options? Environment is non-kerberos. Read somewhere that I need to pass user name as driver extra java option. Cannot find that post now!!


Solution

  • export HADOOP_USER_NAME=hive worked. SparkLauncher has overload to accept Map of environment variables. As for spark.yarn.principle, the environment is non-kerberos. As per my reading yarn.principle works only with kerboros. Did the following

    Process spark = new SparkLauncher(getEnvironmentVar(ps.getRunAs()))
                            .setSparkHome(cp.fetchProperty(GlobalConstant.spark_submit_work_dir).toString())
                            .setAppResource(cp.fetchProperty(GlobalConstant.spark_app_resource))
                            .setMainClass(cp.fetchProperty(GlobalConstant.spark_main_class))
                            .addAppArgs(ps.getName())
                            //   .setConf(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS,"-Duser.name=hive")
                            .setConf(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, "-Dlog4j.configuration=file:///opt/eim/log4j_submitgnrfromhdfs.properties")
                            .setVerbose(true)
                            .launch();
    

    Instead of new SparkLancher() used SparkLauncher(java.util.Map<String,String> env).Added or replacedHADOOP_USER_NAME=hive. Checked yarn application -listlaunches as intended withUSER=hive.