I am trying to figure out how I can terminate an EMR cluster successfully once all the steps submitted to it are 'COMPLETED'|'CANCELLED'|'FAILED'|'INTERRUPTED'
. There are three Lambda functions.
'COMPLETED'|'CANCELLED'|'FAILED'|'INTERRUPTED'
.I've done till Lambda 3's step submission, but unable to do the rest.
I have successfully created EMR through:
conn = boto3.client("emr")
cluster_id = conn.run_job_flow()
submitted steps through:
conn = boto3.client("emr")
action = conn.add_job_flow_steps(JobFlowId=cluster_id, Steps=event["steps"])
Now how can this termination be triggered only on the given condition? I saw the boto3 API doc which has client.terminate_job_flows()
, but this function doesn't wait for the steps to finish or fail and directly hits the termination process.
Is there a way to change KeepJobFlowAliveWhenNoSteps
from TRUE
to FALSE
when all my steps are done? Then I think it should automatically turn off the cluster. But going by the API docs, didn't find any option to change this parameter once the run_job_flow()
is called.
Hope I was able to convey the issue I faced correctly. Any help?
Note: Using Python 3.8 in AWS Lambda. Each steps are Spark jobs.
I agree with your research. The optimal situation would be to set KeepJobFlowAliveWhenNoSteps
to FALSE
to have the cluster self-terminate.
I do notice that the RunJobFlow documentation says:
If the
KeepJobFlowAliveWhenNoSteps
parameter is set toTRUE
, the cluster transitions to theWAITING
state rather than shutting down after the steps have completed.
Therefore, the Lambda function could check whether the cluster is in the WAITING
state and, if so, shutdown the cluster. However, this would take repeated checking.
It might be possible to submit a final step that calls the EMR API to shutdown the cluster. This means that the cluster is effectively calling for its own termination as a final step. (I haven't tried this concept, but it would be a clean way of performing the shutdown without having to repeatedly check the status.)
There is also a similar discussion about shutting down idle clusters on this Question: How to terminate AWS EMR Cluster automatically after some time