Why do only few nodes work in apache spark on yarn?

I have 7 datanodes and 1 namenode. Our every node had 32 Gb of memory and 20 cores. So I set container memory to 30 Gb and container virtual CPU cores to 18.

However, only three datanodes work and the rest of datanodes don't work.

Below code is my setting.

/opt/spark/bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--driver-memory 4g \
--driver-cores 18 \
--executor-memory 8g \
--executor-cores 18 \
--num-executors 7 \

Java code

SQLContext sqlc = new SQLContext(spark);

Dataset<Row> df = sqlc.read()
        .format("com.databricks.spark.csv")
        .option("inferSchema", "true")
        .load(traFile);

df.repartition(PartitionSize);  //PartitionSize = 7
df.persist( StorageLevel.MEMORY_ONLY() );

This is my data information:

and I try a below command

sudo -u hdfs hdfs balancer

However,

Solution

I can solve this problem by adding my script,

--conf "spark.locality.wait.node=0"

Below code is my new script,

/opt/spark/bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--driver-memory 4g \
--driver-cores $drivercores \
--executor-memory 8g \
--executor-cores $execores \
--num-executors $exes \
--conf "spark.locality.wait.node=0" \

thanks to this script, all nodes work.