Search code examples
amazon-web-servicesamazon-emrautoscaling

How to Add TaskInstanceGroup to AWS EMR for autoscaling using cloudformation?


I want to add a auto scaling group for Task Nodes and unable to get it to work with cloudformation.

Same thing works fine for CoreInstanceGroup like below.

Instances:
        CoreInstanceGroup:
          InstanceCount: 1
          InstanceType: !Ref CoreInstanceType
          Market: ON_DEMAND
          Name: Core Instance
          AutoScalingPolicy:
            Constraints:
              MinCapacity: !Ref CoreMinCapacity
              MaxCapacity: !Ref CoreMaxCapacity

When i replace CoreInstanceGroup with TaskInstanceGroup, the linter gives a warning and on running the script fails with error Property Not found.

Came across a Terraform script which refers to the TaskInstanceGroup. Anyone has had a way to figure this out ?

TIA.


Solution

  • Task Instance group is not part of AWS::EMR::Cluster. Thats why you are getting the error.
    You have attach TaskInstanceGroup as different resource.
    Which is AWS::EMR::InstanceGroupConfig.


    JobFlowId: !Ref myEMRCluster this will determine in which cluster it is going to attach the resource. myEMRCluster is the resource name of EMR.
    You can attach multiple TaskInstanceGroup with different autoscaling policy.
    Also you can have different CloudFormation script for your task group. In that case you have to pass cluster id like JobFlowId: 'j-ABCD123456789'.

    AWSTemplateFormatVersion: 2010-09-09
    Resources:
      myEMRCluster:
        Type: 'AWS::EMR::Cluster'
        Properties: <... Your existing config ...>
      TaskInstanceGroup:
        Type: 'AWS::EMR::InstanceGroupConfig'
        Properties:
          InstanceRole: TASK
          InstanceCount: 0
          InstanceType: 'r5.8xlarge'
          Market: SPOT
          BidPrice: '1.110'
          Name: cfnTask
          JobFlowId: !Ref myEMRCluster
          AutoScalingPolicy:
            Constraints:
              MinCapacity: 0
              MaxCapacity: 40
            Rules:
              - Name: container-pending-ratio-scale-out
                Description: >-
                  Replicates the default scale-out rule in the console for YARN
                  memory.
                Action:
                  SimpleScalingPolicyConfiguration:
                    AdjustmentType: CHANGE_IN_CAPACITY
                    ScalingAdjustment: 10
                    CoolDown: 300
                Trigger:
                  CloudWatchAlarmDefinition:
                    ComparisonOperator: GREATER_THAN
                    EvaluationPeriods: 2
                    MetricName: ContainerPendingRatio
                    Namespace: AWS/ElasticMapReduce
                    Period: 300
                    Threshold: 2
                    Statistic: AVERAGE
                    Unit: COUNT
                    Dimensions:
                      - Key: JobFlowId
                        Value: '${emr.clusterId}'
              - Name: idle-scale-in
                Description: Replicates the default scale-in rule in the console for idle.
                Action:
                  SimpleScalingPolicyConfiguration:
                    AdjustmentType: CHANGE_IN_CAPACITY
                    ScalingAdjustment: -40
                    CoolDown: 300
                Trigger:
                  CloudWatchAlarmDefinition:
                    ComparisonOperator: LESS_THAN_OR_EQUAL
                    EvaluationPeriods: 2
                    MetricName: ContainerAllocated
                    Namespace: AWS/ElasticMapReduce
                    Period: 300
                    Threshold: 0
                    Statistic: AVERAGE
                    Unit: COUNT
                    Dimensions:
                      - Key: JobFlowId
                        Value: '${emr.clusterId}'
      myEMRStep:
        Type: 'AWS::EMR::Step'
        Properties: <... If you have any ...>
    
    

    Hope this helps.