Search code examples
apache-sparkspark-submit

Spark job not using the worker nodes on the cluster


I have set up spark on a cluster of 3 nodes, one is my namenode-master (named h1) and other two are my datanode-workers (named h2 and h3). When I give the command to run a spark job on my master, it seems like the job is not getting distributed to the workers and it is just being done on the master. The command I gave to run the spark job is

bin/spark-submit --class org.dataalgorithms.chap07.spark.FindAssociationRules /home/ubuntu/project_spark/data-algorithms-1.0.0.jar ./in/xaa

The reason why I think its just running on the master is because when I go on the Spark Application GUI I just see the master h1 in the executor list. I would think I should see h2 and h3 my worker nodes too here? SparkUI

Correct me if I am wrong. I am a newbie so please excuse me for my ignorance.


Solution

  • Thank you for all the help and suggestions. I tried many of them but ended up with some or the other error. What helped me is specifying the --master spark://IP:PORT with my regular command. So my new execution command looked like this

    bin/spark-submit --class org.dataalgorithms.chap07.spark.FindAssociationRules --master spark://IP:PORT /home/ubuntu/project_spark/data-algorithms-1.0.0.jar ./in/xaa
    

    This started my spark job in a truly distributed cluster mode