Search code examples
amazon-s3airflowboto3amazon-iam

Apache Airflow: Fetching remote s3 logs fails: An error occurred (AccessDenied) when calling the AssumeRole operation


Apache Airflow version: 2.1.2

Environment:

  • Cloud provider or hardware configuration: AWS ECS Fargate

What happened:

I have made an update from 2.0.1 to 2.1.2, and fetching the logs from s3 fails suddenly: An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::111111111:assumed-role/airflow-ecs-task-role/cfdjkal342nk432hvbkjl34 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::111111111:role/airflow-ecs-task-role

I am wondering why the ecs task itself is not able to assume its own role? Isn´t that what it basically means?

What you expected to happen:

Getting the logs from remote s3 as before.

Anything else we need to know: All Fargate tasks (webserver, scheduler, worker) are getting the following environment variables. I have followed this approach to generate the connection URI.

- Name: AIRFLOW_CONN_LOGS_S3
  Value: !Sub 's3://s3?aws_account_id=111111111&role_arn=arn%3Aaws%3Aiam%3A%3A919107267526%3Arole%2Fairflow-ecs-task-role'
- Name: AIRFLOW__LOGGING__REMOTE_LOGGING
  Value: 'true'
- Name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
  Value: !Sub "s3://logs-bucket/"
- Name: AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID
  Value: logs_s3
- Name: AIRFLOW__LOGGING__ENCRYPT_S3_LOGS
  Value: 'false'

How often does this problem occur? Once? Every time etc?

Any relevant logs to include? Put them here in side a detail tag:

*** Failed to verify remote log exists s3://bucket/dag/dag/2021-07-23T11:37:30.860418+00:00/1.log.
An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::111111111:assumed-role/airflow-ecs-task-role/cfdjkal342nk432hvbkjl34 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::111111111:role/airflow-ecs-task-role
*** Falling back to local log
*** Log file does not exist:

I do have the following attached to the airflow-ecs-task-role:

AssumeRolePolicyDocument:
  Statement:
    - Effect: Allow
      Principal:
        Service: ecs-tasks.amazonaws.com
      Action: 'sts:AssumeRole'

If I add this, it works though:

 Principal:
      Service: ecs-tasks.amazonaws.com
      AWS: arn:aws:sts:::assumed-role/airflow-ecs-task-role/TASK_ID

Can someone help why that is the case? The docs state that I cannot wildcard that, but I do not know the task ID beforehand.


Solution

  • Adding the task role as a principal resolved the issue:

    Resources:
      ExecutionRole:
        Type: AWS::IAM::Role
        Properties:
          RoleName: !Sub ${EnvironmentName}-ecs-execution-role-${Stage}
          AssumeRolePolicyDocument:
            Statement:
              - Effect: Allow
                Principal:
                  Service: ecs-tasks.amazonaws.com
                  AWS: !Sub arn:aws:iam::${AWS::AccountId}:role/airflow-ecs-task-role-${Stage}
                Action: 'sts:AssumeRole'
          ManagedPolicyArns:
            - 'arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy'
            - 'arn:aws:iam::aws:policy/AmazonSSMReadOnlyAccess'