I am attempting to run a spark job (using spark2-submit) from Oozie, so this job can be run on a schedule.
The job runs just fine when running we run the shell script from command-line under our service account (not Yarn). When we run it as a Oozie Workflow the following happens:
17/11/16 12:03:55 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.hadoop.security.AccessControlException: Permission denied:
user=yarn, access=WRITE, inode="/user":hdfs:supergroup:drwxrwxr-x
Oozie is running the job as the user Yarn. IT has denied us any ability to change Yarn's permissions in HDFS, and there is not a single reference to the user
directory in the Spark script. We have attempted to ssh into the server - though this doesn't work - we have to ssh out of our worker nodes, onto the master.
The shell script:
spark2-submit --name "SparkRunner" --master yarn --deploy-mode client --class org.package-name.Runner hdfs://manager-node-hdfs/Analytics/Spark_jars/SparkRunner.jar
Any help would be appreciated.
I was able to fix this by following https://stackoverflow.com/a/32834087/8099994
At the beginning of my shell script I now include the following line:
export HADOOP_USER_NAME=serviceAccount;