Search code examples
amazon-web-servicesamazon-s3aws-cligoogle-cloud-composer

How to install AWS CLI on Cloud Composer?


I need to install AWS CLI tool on Google Cloud Composer to be able to use it with BashOperator from Airflow DAGs.

AWS CLI documentation page explains how to install it as a package, but Cloud Composer doesn't have a supported way to install apt packages on all instances.

My motivation. I need to synchronize a large S3 bucket with another storage. The command aws s3 sync (link) suits perfectly for this. Unfortunately, I didn't find a replacement for this command in Airflow Amazon provider operators. Also, it seems this command is not supported by boto and boto3 (github issue 1, issue 2).


Solution

  • To install the AWS CLI on Cloud Composer, you can use the following steps:

    1. Create a new Airflow DAG and add a BashOperator task.
    2. In the BashOperator task, use the following command to install the AWS CLI:
    pip install awscli
    
    1. Configure the AWS CLI with your AWS credentials. You can do this by adding the following code to the BashOperator task:
    aws configure set aws_access_key_id YOUR_AWS_ACCESS_KEY_ID
    aws configure set aws_secret_access_key YOUR_AWS_SECRET_ACCESS_KEY
    
    1. Configure the AWS CLI to use your chosen AWS region. You can do this by adding the following code to the BashOperator task:
    aws configure set default.region AWS_REGION
    

    Replace AWS_REGION with the name of a supported AWS region.

    1. Save and run the DAG.

    Once the DAG has run, the AWS CLI will be installed on all of the Airflow worker nodes in your Cloud Composer environment. You can then use the AWS CLI in your Airflow DAGs to interact with AWS services.

    Note that you will need to have the pip Python package installed on your Cloud Composer environment in order to use the above steps. You can install pip using the following command:

    sudo apt install python-pip
    

    Here is an example of a complete Airflow DAG that installs the AWS CLI and configures it with your AWS credentials:

    from airflow import DAG
    from airflow.operators.bash_operator import BashOperator
    
    default_args = {
        'start_date': airflow.utils.dates.days_ago(2),
        'retries': 1
    }
    
    dag = DAG('install_aws_cli', default_args=default_args)
    
    install_aws_cli = BashOperator(
        task_id='install_aws_cli',
        bash_command='pip install awscli',
        dag=dag
    )
    
    configure_aws_cli = BashOperator(
        task_id='configure_aws_cli',
        bash_command=[
            'aws configure set aws_access_key_id YOUR_AWS_ACCESS_KEY_ID',
            'aws configure set aws_secret_access_key YOUR_AWS_SECRET_ACCESS_KEY',
            'aws configure set default.region AWS_REGION'
        ],
        dag=dag
    )
    
    install_aws_cli >> configure_aws_cli
    

    Once you have saved the DAG, you can run it using the following command:

    airflow run install_aws_cli
    

    Once the DAG has run, the AWS CLI will be installed on all of the Airflow worker nodes in your Cloud Composer environment. You can then use the AWS CLI in your Airflow DAGs to interact with AWS services.