Search code examples
bashcurlgoogle-cloud-composer

curl command BashOpertor in Cloud Composer


I am following the tutorial mentioned in this link - download_rocket_launches.py . As I am running this in Cloud Composer, I want to put in the native path i.e. /home/airflow/gcs/dags but it's failing with error path not found.

What path can I give for this command to work? Here is the task I am trying to execute -

download_launches = BashOperator(
    task_id="download_launches",
    bash_command="curl -o /tmp/launches.json -L 'https://ll.thespacedevs.com/2.0.0/launch/upcoming'",  # noqa: E501
    dag=dag,
)

Solution

  • This worked on my end:

    import json
    import pathlib
    
    import airflow.utils.dates
    import requests
    import requests.exceptions as requests_exceptions
    from airflow import DAG
    from airflow.operators.bash import BashOperator
    from airflow.operators.python import PythonOperator
    
    dag = DAG(
        dag_id="download_rocket_launches",
        description="Download rocket pictures of recently launched rockets.",
        start_date=airflow.utils.dates.days_ago(14),
        schedule_interval="@daily",
    )
    
    download_launches = BashOperator(
        task_id="download_launches",
        bash_command="curl -o  /home/airflow/gcs/data/launches.json -L 'https://ll.thespacedevs.com/2.0.0/launch/upcoming' ",  # put space in between single quote and double quote 
        dag=dag,
    )
    
    
    download_launches 
    

    Output:

    enter image description here

    The key was to put space between single quote ' and double quote " towards the end of your bash command.

    Also, it is recommended to use the Data folder when it comes to mapping out your output file as stated in the GCP documentation:

    gs://bucket-name/data /home/airflow/gcs/data: Stores the data that tasks produce and use. This folder is mounted on all worker nodes.