I am trying to run the spark-submit command on my Hadoop cluster Here is a summary of my Hadoop Cluster:
I am trying to run one of the spark examples using the following spark-submit
command
spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.12-3.0.3.jar 10
I get the following error:
[2022-07-25 13:32:39.253]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher
I get the same error when trying to run a script with PySpark.
I have tried/verified the following:
HADOOP_HOME
, SPARK_HOME
and HADOOP_CONF_DIR
have been set in my .bashrc
fileSPARK_DIST_CLASSPATH
and HADOOP_CONF_DIR
have been defined in spark-env.sh
spark.master yarn
, spark.yarn.stagingDir hdfs://hadoop-namenode:8020/user/bitnami/sparkStaging
and spark.yarn.jars hdfs://hadoop-namenode:8020/user/bitnami/spark/jars/
in spark-defaults.conf
hadoop fs -put $SPARK_HOME/jars/* hdfs://hadoop-namenode:8020/user/bitnami/spark/jars/
)http://hadoop-namenode:8042
) do not provide any further details about the error.I figured out why I was getting this error. It turns out that I made an error while specifying spark.yarn.jars
in spark-defaults.conf
The value of this property must be
hdfs://hadoop-namenode:8020/user/bitnami/spark/jars/*
instead of
hdfs://hadoop-namenode:8020/user/bitnami/spark/jars/
i.e. Basically, we need to specify the jar files as the value to this property and not the folder containing the jar files.