Search code examples
amazon-web-servicesamazon-cloudwatchamazon-emrterminate

How to terminate AWS EMR Cluster automatically after some time


I currently have a task at hand to Terminate a long-running EMR cluster after a set period of time (based on some metric). Google Dataproc has this capability in something called "Cluster Scheduled Deletion" Listed here: Cluster Scheduled Deletion

Is this something that is possible on EMR natively? Maybe using Cloudwatch metrics? Or can I write a long-running jar which will sit on the EMR Master node and just poll yarn for some idle time metric and then shut down the cluster after a set period of time?

Edit: For more clarification. I would like some functionality wherein the cluster is terminated based on idle for some x amount of time. e.g. If the cluster has been up for a while but no jobs have been run for say 1 hour and the cluster is just sitting there doing nothing, then I'd like the ability to terminate the cluster.


Solution

  • The easiest method would be used to Amazon EMR Metrics and Dimensions for Amazon CloudWatch. There is an isIdle boolean that "indicates that a cluster is no longer performing work".

    You could create a CloudWatch Alarm that says if it is True for more than x minutes, then trigger the alarm. This would send a message to Amazon SNS, which can trigger a Lambda function to shutdown the cluster.

    Components:

    • Amazon CloudWatch Alarm
    • Amazon SNS queue
    • AWS Lambda function

    Update: This apparently isn't suitable (see comments below).

    An alternate method would be:

    • Use Amazon CloudWatch Events to schedule a Lambda function every x seconds
    • The Lambda function looks for any clusters with a particular tag that indicates how long to wait until shutdown (eg 40 minutes). If the tag is not present, the cluster remains untouched.
    • The Lambda function queries the cluster state (somehow -- probably via a Hadoop API call), then:
      • If the cluster is idle and there is no Idle Since tag, add an Idle Since tag with the current timestamp
      • If the cluster is idle and it been more than x minutes since the timestamp in the Idle Since tag, terminate the cluster.
      • If the cluster is not idle, remove the Idle Since tag (if present)