Search code examples
amazon-web-servicesamazon-emraws-step-functions

Auto Terminate EMR Cluster using Step Functions


I have a use case where I would submitting dynamic number of jobs to the cluster, hence opting to submit jobs via SDK from a lambda and not add submit jobs as a task in step function. The EMR cluster would be used once a week and hence want to opt for onDemand variant.

Looks like "auto-terminate" parameter is not supported when creating cluster from Step Functions. As per the doc, The field Instances.KeepJobFlowAliveWhenNoSteps is mandatory, and must have the Boolean value TRUE.

Is there an alternative way to terminate cluster after all jobs are completed?


Solution

  • You have few options to terminate the cluster, but it depends on your scenerio.

    1. Since you are using Lambda, you can check for the state of cluster periodically and if its is WAITING, you can terminate the cluster with the ID. You can also make a CloudWatch event with AWS Lambda function to check if EMR cluster is Idle. you can find a good answer for this specific approach here and the code implementation by the same user here

    2. A very naive and stupid thing but can work is to deliberately submit a failing step as the final step and use 'TERMINATE_CLUSTER' on option key ActionOnFailure while submitting with add_job_flow_steps()

    Update on your question:

    would there be potential race condition where in EMR cluster could terminate after its started and before jobs got submitted?

    The waiting time between the cluster staring and jobs submission/first job running isnt same, you can have a logic around deciding maximum idle time threshold for cloudwatch