Search code examples
airflowgoogle-cloud-composer

Generating temporary file on Cloud Compose and uploading it for GCS


I have baciscally two task in a Cloud Composer dag.

  1. Generate a local file in /tmp/abc.csv file
  2. Upload /tmp/abc.csv file to GCS

When the task 2, run sometime it gives a error that cannot find the file. I suppose that Cloud Compose is cleaning the tmp folder. Is there another way to do this? Should I store in different location?


Solution

  • Best-practice wise, you should consider the local filesystem to be ephemeral to the task instance's lifetime.

    But Cloud Composer comes with some data synchronisation capabilities on certain folders in the set up cloud storage bucket:

    When you modify DAGs or plugins in the Cloud Storage bucket, synchronizes the data across all the nodes in the cluster. The data/ and logs/ folders synchronize bi-directionally by using Cloud Storage FUSE. docs

    So you could use the /data folder for what you are describing. Any data there can be overridden by other DAG/TASK executions so make sure you are using a path that wont conflict with other TASK executions, or that you don't have the same TASK that depends upon those files running in parallel.