Search code examples
pythonamazon-web-servicesapache-sparkhadoop-yarnemr

How to set YARN memoryOverhead from AWSCLI for EMR


I'm having a little trouble grokking why exactly my Spark job died, so I will include the traceback at the bottom of this post so that someone more experienced than me can give me some insight:) As far as I can tell my nodes were dying because the memoryOverhead was being exceeded. How can I set this from the awscli so that I don't run into this issue?

Here is some of my traceback:

16/05/17 20:20:46 WARN TaskSetManager: Lost task 97.0 in stage 3.0 (TID 9937, ip-172-31-14-59.us-west-2.compute.internal): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 5.5 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 WARN TaskSetManager: Lost task 60.0 in stage 3.0 (TID 9900, ip-172-31-14-59.us-west-2.compute.internal): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 5.5 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container killed by YARN for exceeding memory limits. 5.5 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 WARN TaskSetManager: Lost task 134.0 in stage 3.0 (TID 9974, ip-172-31-14-59.us-west-2.compute.internal): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 5.5 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 WARN TaskSetManager: Lost task 23.0 in stage 3.0 (TID 9863, ip-172-31-14-59.us-west-2.compute.internal): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 5.5 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 INFO YarnClientSchedulerBackend: Asked to remove non-existent executor 9
16/05/17 20:20:46 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container killed by YARN for exceeding memory limits. 5.5 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 ERROR YarnScheduler: Lost executor 15 on ip-172-31-14-46.us-west-2.compute.internal: Container killed by YARN for exceeding memory limits. 5.5 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 WARN TaskSetManager: Lost task 88.0 in stage 3.0 (TID 9928, ip-172-31-14-46.us-west-2.compute.internal): ExecutorLostFailure (executor 15 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 5.5 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 WARN TaskSetManager: Lost task 51.0 in stage 3.0 (TID 9891, ip-172-31-14-46.us-west-2.compute.internal): ExecutorLostFailure (executor 15 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 5.5 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 WARN TaskSetManager: Lost task 125.0 in stage 3.0 (TID 9965, ip-172-31-14-46.us-west-2.compute.internal): ExecutorLostFailure (executor 15 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 5.5 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 WARN TaskSetManager: Lost task 14.0 in stage 3.0 (TID 9854, ip-172-31-14-46.us-west-2.compute.internal): ExecutorLostFailure (executor 15 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 5.5 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 INFO YarnClientSchedulerBackend: Asked to remove non-existent executor 15
16/05/17 20:20:46 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container killed by YARN for exceeding memory limits. 5.6 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 ERROR YarnScheduler: Lost executor 14 on ip-172-31-14-61.us-west-2.compute.internal: Container killed by YARN for exceeding memory limits. 5.6 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 WARN TaskSetManager: Lost task 85.0 in stage 3.0 (TID 9925, ip-172-31-14-61.us-west-2.compute.internal): ExecutorLostFailure (executor 14 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 5.6 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 WARN TaskSetManager: Lost task 48.0 in stage 3.0 (TID 9888, ip-172-31-14-61.us-west-2.compute.internal): ExecutorLostFailure (executor 14 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 5.6 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 WARN TaskSetManager: Lost task 122.0 in stage 3.0 (TID 9962, ip-172-31-14-61.us-west-2.compute.internal): ExecutorLostFailure (executor 14 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 5.6 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
16/05/17 20:20:46 WARN TaskSetManager: Lost task 11.0 in stage 3.0 (TID 9851, ip-172-31-14-61.us-west-2.compute.internal): ExecutorLostFailure (executor 14 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 5.6 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
1

Solution

  • You simply provide the configuration in the spark-submit command. For instance:

    spark-submit --master yarn-client --conf spark.yarn.executor.memoryOverhead=4096 --num-executors 10 --executor-memory 8G --executor-cores 6 ...