I am running spark application on yarn cluster in cluster deploy mode using following command
spark-submit --conf spark.executor.memory=24g --conf spark.master=yarn --conf spark.submit.deployMode=cluster --conf spark.executor.extraJavaOptions=-Dfile.encoding=UTF-8 --conf spark.files=file:///opt/configurations/app.conf --class com.example.HelloWorld --queue sample_q file:///opt/jars/example.jar '{"sample":{}}'
This command is not passing the entire argument to HelloWorld class.
Main method argument passed : {"sample":{
Main method argument expected: {"sample":{}}
The same command is running properly with client deploy mode
spark-submit --conf spark.executor.memory=24g --conf spark.master=yarn --conf spark.submit.deployMode=client --conf spark.executor.extraJavaOptions=-Dfile.encoding=UTF-8 --conf spark.files=file:///opt/configurations/app.conf --class com.example.HelloWorld --queue sample_q file:///opt/jars/example.jar '{"sample":{}}'
Upon inspecting the launch_container.sh
script in yarn worker node it was found that the command also had truncated string within it (--arg '{\"sample\":{'
)
Spark Version: 2.3
Hadoop Version: 2.7.3
Yarn consider {{
and }}
as parameter expansion character hence any occurrence is considered as an environment variable and replaced with the corresponding value. Since there is no environment variable.
This causes an issue in cluster deploy mode as driver runs in yarn cluster.
Reference: YarnApplicationConstants