Search code examples
amazon-web-servicesemr

How AWS EMR react after reaching the 256 pending/running jobs limit is reached?


According to

https://docs.aws.amazon.com/emr/latest/ManagementGuide/AddMoreThan256Steps.html

AWS EMR "Beginning with AMI 3.1.1 (Hadoop 2.x) and AMI 2.4.8 (Hadoop 1.x), you can submit an unlimited number of steps over the lifetime of a long-running cluster, but only 256 can be active or pending at any given time"

My questions are:

  1. If you already reached the 256 limit, where and how you'll find the rest of submitted jobs that were submitted?

  2. Is EMR keeping a queue of the submitted jobs, and when a job is done (succes/failed) it will pick another job from the "invisible client job queue"?

  3. If EMR has such a queue (as explained in 2), for how long it will keep these jobs?

  4. Can we access via EMR API, the submitted jobs that are not among the 256 jobs?

I look forward for your answers/

Regards,

Florin


Solution

  • I've discussed with the Amazon Support Center and they gave the following answers:

    "1. This limit is imposed by the EMR Step API (this is on the AWS side), not by YARN or Spark themselves. This is what you see when you look at the cluster properties in the console and navigate to 'Steps' within the cluster details, or make a 'list-steps' call from the CLI. You'll know when you've hit this limit, because when you submit a new step through the step API (either through the console or 'add-steps' call from CLI), you'll get an error stating that there are already 256 steps in a running or pending state, and the step will not be accepted. Therefore, there will not be a record of that step's submission

    1. I'm not sure how to answer this one in exact terms, so I'll just explain how an EMR cluster is going to handle jobs. In a cluster running Hadoop 2.x, YARN (the cluster's resource manager) is responsible for tracking jobs. When a job is submitted to YARN (either via the AWS api as a step or directly to YARN,) YARN will keep track of the submitted jobs and process them as the cluster's scheduler and resources allow. The step API is just an abstraction layer that lets you leverage the AWS API to submit jobs to YARN without directly accessing the cluster, using the EMR service endpoint instead. When the request is received at the step API, that information is collected by a daemon on the cluster called the 'instance-controller', and passed to YARN as a job submission. Step submissions can be seen in the console in the cluster's details on the 'Steps' tab, and YARN jobs can be seen by accessing the cluster's shell and running 'yarn application -list -appStates ALL'. Jobs are processed by order of submission and/or priority, based on the configuration of the YARN scheduler being used.

    2. The cluster will track each job for the life of the cluster.

    3. Again, I want to clarify that the limit is specific to jobs submitted via the EMR Step API; jobs can still be submitted directly to YARN via its own API if you want, and you can access both the list of pending/active steps and YARN jobs separately.

    YARN has a huge capacity for storing and managing information; the step API's limits have to do with the storage layers that keep persistent information about the cluster's configuration on the EMR back-end."

    My resume, confirmed by the AWS support team:

    • 256 jobs limit is posed by the EMR jobs API and it is a HARD limit.

    • As a client, when I'll reach the limit, I'll get an exception that this limit is already reached and all the step submissions will be rejected.

    • When this limit is reached, it is still possible to submit jobs via YARN client.

    I hope these will help others that have these questions.

    Regards, Florin