I need to install AWS CLI tool on Google Cloud Composer to be able to use it with BashOperator from Airflow DAGs.
AWS CLI documentation page explains how to install it as a package, but Cloud Composer doesn't have a supported way to install apt packages on all instances.
My motivation. I need to synchronize a large S3 bucket with another storage. The command aws s3 sync
(link) suits perfectly for this. Unfortunately, I didn't find a replacement for this command in Airflow Amazon provider operators. Also, it seems this command is not supported by boto and boto3 (github issue 1, issue 2).
To install the AWS CLI on Cloud Composer, you can use the following steps:
pip install awscli
aws configure set aws_access_key_id YOUR_AWS_ACCESS_KEY_ID
aws configure set aws_secret_access_key YOUR_AWS_SECRET_ACCESS_KEY
aws configure set default.region AWS_REGION
Replace AWS_REGION
with the name of a supported AWS region.
Once the DAG has run, the AWS CLI will be installed on all of the Airflow worker nodes in your Cloud Composer environment. You can then use the AWS CLI in your Airflow DAGs to interact with AWS services.
Note that you will need to have the pip
Python package installed on your Cloud Composer environment in order to use the above steps. You can install pip
using the following command:
sudo apt install python-pip
Here is an example of a complete Airflow DAG that installs the AWS CLI and configures it with your AWS credentials:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
default_args = {
'start_date': airflow.utils.dates.days_ago(2),
'retries': 1
}
dag = DAG('install_aws_cli', default_args=default_args)
install_aws_cli = BashOperator(
task_id='install_aws_cli',
bash_command='pip install awscli',
dag=dag
)
configure_aws_cli = BashOperator(
task_id='configure_aws_cli',
bash_command=[
'aws configure set aws_access_key_id YOUR_AWS_ACCESS_KEY_ID',
'aws configure set aws_secret_access_key YOUR_AWS_SECRET_ACCESS_KEY',
'aws configure set default.region AWS_REGION'
],
dag=dag
)
install_aws_cli >> configure_aws_cli
Once you have saved the DAG, you can run it using the following command:
airflow run install_aws_cli
Once the DAG has run, the AWS CLI will be installed on all of the Airflow worker nodes in your Cloud Composer environment. You can then use the AWS CLI in your Airflow DAGs to interact with AWS services.