Trying to run spark job with masterURL=yarn-client
. Using SparkLauncher 2.10. The java code is wrapped in nifi processor. Nifi is currently running as root. When I do yarn application -list, I see the spark job started with USER = root
. I want to run it with USER = hive
.
Following is my SparkLauncher code.
Process spark = new SparkLauncher()
.setSparkHome(cp.fetchProperty(GlobalConstant.spark_submit_work_dir).toString())
.setAppResource(cp.fetchProperty(GlobalConstant.spark_app_resource))
.setMainClass(cp.fetchProperty(GlobalConstant.spark_main_class))
.addAppArgs(ps.getName())
// .setConf(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS,"-Duser.name=hive")
.setConf(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, "-Dlog4j.configuration=file:///opt/eim/log4j_submitgnrfromhdfs.properties")
.setVerbose(true)
.launch();
Do I need to pass user as driver extra options? Environment is non-kerberos. Read somewhere that I need to pass user name as driver extra java option. Cannot find that post now!!
export HADOOP_USER_NAME=hive worked. SparkLauncher has overload to accept Map of environment variables. As for spark.yarn.principle, the environment is non-kerberos. As per my reading yarn.principle works only with kerboros. Did the following
Process spark = new SparkLauncher(getEnvironmentVar(ps.getRunAs()))
.setSparkHome(cp.fetchProperty(GlobalConstant.spark_submit_work_dir).toString())
.setAppResource(cp.fetchProperty(GlobalConstant.spark_app_resource))
.setMainClass(cp.fetchProperty(GlobalConstant.spark_main_class))
.addAppArgs(ps.getName())
// .setConf(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS,"-Duser.name=hive")
.setConf(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, "-Dlog4j.configuration=file:///opt/eim/log4j_submitgnrfromhdfs.properties")
.setVerbose(true)
.launch();
Instead of new SparkLancher()
used SparkLauncher(java.util.Map<String,String> env).
Added or replacedHADOOP_USER_NAME=hive.
Checked yarn application -list
launches as intended withUSER=hive.