Search code examples
javaapache-sparkhadoop-yarn

Spark on YARN "user.dir"


I have an external API jar which is looking for dependencies at below environment path

user.dir

, we are able to consume the api in spark shell local mode by placing the dependencies in the invocation directory. My question is when I am submitting the job to a YARN cluster , I am unable to use the API and its unable to resolve its runtime dependencies, despite the fact that I have placed the dependencies in HDFS at path

/user/username/

What am I doing wrong here, is there a way I can customize user.dir in case of a spark submit job. ?


Solution

  • Just Dropping it here for someone who might get Stuck Spark Driver processes spawn executors which get executed on different nodes, there is no definite and consistent way to know the path before the executor is spawned. Hence I was better off bundling the artifact as part of the Jar itself or as a dependency.