Search code examples
scalaapache-sparkhadoop-yarn

Spark submit truncates arguments in yarn cluster mode


I am running spark application on yarn cluster in cluster deploy mode using following command

spark-submit --conf spark.executor.memory=24g --conf spark.master=yarn --conf spark.submit.deployMode=cluster --conf spark.executor.extraJavaOptions=-Dfile.encoding=UTF-8 --conf spark.files=file:///opt/configurations/app.conf --class com.example.HelloWorld --queue sample_q file:///opt/jars/example.jar '{"sample":{}}'

This command is not passing the entire argument to HelloWorld class. Main method argument passed : {"sample":{ Main method argument expected: {"sample":{}}

The same command is running properly with client deploy mode

spark-submit --conf spark.executor.memory=24g --conf spark.master=yarn --conf spark.submit.deployMode=client --conf spark.executor.extraJavaOptions=-Dfile.encoding=UTF-8 --conf spark.files=file:///opt/configurations/app.conf --class com.example.HelloWorld --queue sample_q file:///opt/jars/example.jar '{"sample":{}}'

Upon inspecting the launch_container.sh script in yarn worker node it was found that the command also had truncated string within it (--arg '{\"sample\":{')

Spark Version: 2.3

Hadoop Version: 2.7.3


Solution

  • Yarn consider {{ and }} as parameter expansion character hence any occurrence is considered as an environment variable and replaced with the corresponding value. Since there is no environment variable.

    This causes an issue in cluster deploy mode as driver runs in yarn cluster.

    Reference: YarnApplicationConstants