Search code examples
amazon-web-servicesyamlaws-cloudformationaws-cloudformation-custom-resourceassume-role

User: batch.amazonaws.com is not authorized to perform: sts:AssumeRole on resource


I've been trying to create some infrastructure that includes bunch of services like EC2, ECS, S3 and Batch (few more). Everything seems to be fine, till it reaches the step to build the batch process.

I was following a medium blog and here's the CF template: Github Repo Link

This YAML is outdated and I have made some modifications here and there, but not the ones with roles.

I've had more than 3 CloudFormation stacks stuck in roll back because it can't stabilise the Compute Environment it builds from the YAML config I have. I reached out to Compute Environment to see the exact error and this is what I get:

DELETING - CLIENT_ERROR - User: batch.amazonaws.com is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::402726478692:role/service-role/AWSBatchServiceRole (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: f9d6c19d-4e77-4814-ac2c-b437e0546977; Proxy: null)

Now, It won't even delete this compute environment on automated rollback. But, my main concern is why is it not able to create? I've gone through documentation and few questions here regarding the same topic, but nothing seemed to work.

Here's the excerpt from my YAML config. This part is for compute environment:

ComputeEnvironment:
    Type: "AWS::Batch::ComputeEnvironment"
    Properties:
      Type: MANAGED
      ServiceRole: !Sub "arn:aws:iam::${AWS::AccountId}:role/service-role/AWSBatchServiceRole"
      ComputeEnvironmentName: !Sub "${Environment}-batch-processing_3"
      ComputeResources:
        MaxvCpus: 1
        SecurityGroupIds:
          - !Ref SecurityGroup
        Type: EC2
        Subnets: !Ref Subnets
        MinvCpus: 1
        InstanceRole: !Ref ECSInstanceProfile
        InstanceTypes:
          - "c6gd.medium"
        Tags: {"Name": !Sub "${Environment} - Batch Instance" }
        DesiredvCpus: 1
      State: ENABLED

  JobQueue:
    DependsOn: ComputeEnvironment
    Type: "AWS::Batch::JobQueue"
    Properties:
      ComputeEnvironmentOrder:
        - Order: 1
          ComputeEnvironment: !Ref ComputeEnvironment
      State: ENABLED
      Priority: 1
      JobQueueName: "HighPriority"

  Job:
    Type: "AWS::Batch::JobDefinition"
    Properties:
      Type: container
      JobDefinitionName: !Sub "${Environment}-batch-s3-processor"
      ContainerProperties:
        Memory: 2048
        Privileged: false
        JobRoleArn: !Ref JobRole
        ReadonlyRootFilesystem: true
        Vcpus: 1
        Image: !Sub "${AWS::AccountId}.dkr.ecr.us-west-2.amazonaws.com/${DockerImage}"
      RetryStrategy:
        Attempts: 1

  JobRole:
    Type: "AWS::IAM::Role"
    Properties:
      Path: "/"
      RoleName: !Sub "${Environment}-BatchJobRole"
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          -
            Action: 
              - "sts:AssumeRole"
            Effect: "Allow"
            Principal:
              Service: 
                - "ecs-tasks.amazonaws.com"
                - "batch.amazonaws.com"
      Policies:
        -
          PolicyName: !Sub "${Environment}-s3-access"
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              -
                Effect: "Allow"
                Action: 
                  - "s3:*"
                  - "iam:*"
                  - "batch:*"
                Resource: !Sub "arn:aws:s3:::batch-${AWS::AccountId}-${AWS::Region}/*"

  ECSInstanceProfile:
    Type: "AWS::IAM::InstanceProfile"
    Properties:
      Path: "/"
      Roles:
        - !Ref ECSRole

  ECSRole:
    Type: "AWS::IAM::Role"
    Properties:
      Path: "/"
      RoleName: !Sub "${Environment}-batch-ecs-role"
      SourceAccount:
        Ref: AWS::AccountId
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          -
            Action: "sts:AssumeRole"
            Effect: "Allow"
            Principal:
              Service: 
                - "ec2.amazonaws.com"
                - "batch.amazonaws.com"
      Policies:
        - PolicyName: !Sub "${Environment}-full-access-for-batch-resource"
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              -
                Effect: "Allow"
                Action: 
                  - "s3:*"
                  - "iam:*"
                  - "batch:*"
                Resource: !Sub "arn:aws:s3:::batch-${AWS::AccountId}-${AWS::Region}/*"
        - PolicyName: !Sub ${Environment}-ecs-batch-policy
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              -
                Effect: "Allow"
                Action:
                  - "ecs:CreateCluster"
                  - "ecs:DeregisterContainerInstance"
                  - "ecs:DiscoverPollEndpoint"
                  - "ecs:Poll"
                  - "ecs:RegisterContainerInstance"
                  - "ecs:StartTelemetrySession"
                  - "ecs:StartTask"
                  - "ecs:Submit*"
                  - "logs:CreateLogStream"
                  - "logs:PutLogEvents"
                  - "logs:DescribeLogStreams"
                  - "logs:CreateLogGroup"
                  - "ecr:BatchCheckLayerAvailability"
                  - "ecr:BatchGetImage"
                  - "ecr:GetDownloadUrlForLayer"
                  - "ecr:GetAuthorizationToken"
                  - "s3:*"
                  - "batch:*"
                Resource: "*"
        - PolicyName: !Sub "${Environment}-ecs-instance-policy"
          PolicyDocument:
            Statement:
              -
                Effect: "Allow"
                Action:
                  - "ecs:DescribeContainerInstances"
                  - "ecs:ListClusters"
                  - "ecs:RegisterTaskDefinition"
                  - "s3:*"
                  - "batch:*"
                Resource: "*"
              -
                Effect: "Allow"
                Action:
                  - "ecs:*"
                  - "s3:*"
                  - "batch:*"
                Resource: "*"

As you can see I've tried giving more than enough permissions in these policies which is already a bad practice, but I still can't get it to Assume Role. Any help would be appreciated.

EDIT: I have checked and I can see the AWSBatchServiceRole and I have added AWSBatchServiceRole and AWSBatchFullAccess permissions to it and in the Trust Relationship, I do have Sts:AssumeRole in there. This is the JSON from Trust Relationship:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "batch.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Solution

  • One of my friend figured it out and it worked. It was a dumb mistake.

    Changed arn:aws:iam::${AWS::AccountId}:role/service-role/AWSBatchServiceRole to arn:aws:iam::${AWS::AccountId}:role/AWSBatchServiceRole and it worked.

    service-role/ isn't required, at least not now.