amazon-web-services apache-spark mapreduce rdd distributed-computing

Issues running Spark application on ASW with compute optimized instances

Hello I'm comparing the performances of my Spark algorithm on two different clusters. One with more computational power and one with more memory efficiency.

Cluster 1 has 5 nodes of AWS instances c4.xlarge with 4 vCPU and 7.5GiB of main memory.
Cluster 2 has 5 nodes of AWS instances r4.xlarge with 4 vCPU and 30.5 GiB of main memory.

My code gets divided into 13 stages but only the last 5 stages are actually those I need to take care of performance of. These five below:

The picture above shows the statistics of the stages of running my code on cluster 2 (memory efficiency) you can see Stage 11 and 13 take few minutes because a flatMap and map respectively do some heavy sequential work on lists.

Since the job stage 11 and 13 do is sequential in each partition I'd expect to get better performances for those 2 stages using cluster 1 (compute efficiency) but what I actually got when running on cluster 1 is that the last 3 stages run only 32 tasks (so only 32 partitions) and the running time was slower, like 2 minutes more for each stage.

I realized that only 3 executors were running so actually only 3 of 4 instances were working on my problem. Then what I did was submitting the app this way:

spark-submit --master yarn --deploy-mode cluster —-num—executors 4 --executor-cores 4 --class myclass myjar myparams

So I thought forcing the number of executors to —-num—executors 4 would force the 4th node to get one executor but nothing to do. That didn't work neither. Only 3 executors were active on 3 instances.

I'm sure the problem is this one. Don't you think sequential stages should run faster on cluster 1? Is the problem that cluster 1 has less main memory according to you?

Thank you for cooperating with me in finding an answer.

Solution

Using resources optimally on EMR requires a bit of configuration-tweaking and also a bit of knowledge as to how YARN works. I highly recommend reading this blog post from Cloudera to get a overview of how to tune your applications with YARN.

Anyway, in this particular case, the reason why you only see 3 executors instead of 4, is because you have specified that each executor should have 4 cores (with the flag --executor-cores 4). Since there are only 4 cores available on each worker and since YARN takes 1 core on one of the workers to run the application manager, you will essentially only have 3 workers with 4 cores available and 1 worker with 3 cores available. The worker with 3 cores available cannot run an executor that requires 4 cores and thus an executor is simply not stated on this worker. This leaves you with 3 executors.

Anyway, the aforementioned blog post describes this in details :)

Oh, and in terms of which type cluster will run faster, you have to test it. I follow your theoretical thinking, but I have long since given up on trying to predict which type of clusters will be better for a specific task. Truth is that unless fully understand how Spark creates it's execution plans, you are essentially just guessing. I recommend that you test different instances types and use Ganglia to monitor how your clusters utilize the different resources.