Search code examples
google-cloud-storagegoogle-cloud-composerairflow-2.xgcsfuse

Airflow unable to mount a google cloud bucket using gcsfuse


I want to mount a Google Cloud Bucket into my airflow environment so that I can read and write files on that GCS Bucket. I am using Cloud Composer 2 (composer-2.1.14-airflow-2.5.1 image)

In airflow I created a DAG to run the following bash script

#!/bin/bash

BUCKET="my-bucket"
MOUNT_DIR="/home/airflow/gcs/data/my-bucket"

#Creating $MOUNT_DIR directory & granting it permissions
mkdir -p $MOUNT_DIR
sudo chmod g+w $MOUNT_DIR

# Mounting GCS Bucket
gcsfuse --foreground --debug_fuse --debug_fs --debug_gcs --debug_http -o nonempty $BUCKET $MOUNT_DIR

Here are the logs from Airflow:

[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - Start gcsfuse/0.42.3 (Go version go1.19.5) for app "" using mount point: /home/airflow/gcs/data/my-bucket
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - Opening GCS connection...
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - Creating a mount at "/home/airflow/gcs/data/my-bucket"
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - Creating a new server...
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - Set up root directory for bucket my-bucket
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - gcs: Req              0x0: <- ListObjects("")
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - gcs: Req              0x0: -> ListObjects("") (131.395831ms): OK
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - Mounting file system "my-bucket"...
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - fuse_debug: Beginning the mounting kickoff process
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - fuse_debug: Parsing fuse file descriptor
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - fuse_debug: Preparing for direct mounting
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - fuse_debug: Directmount failed. Trying fallback.
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - fuse_debug: Creating a socket pair
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - fuse_debug: Creating files to wrap the sockets
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - fuse_debug: Starting fusermount/os mount
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - /usr/bin/fusermount: fuse device not found, try 'modprobe fuse' first
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - Error while mounting gcsfuse: mountWithConn: Mount: mount: running /usr/bin/fusermount: exit status 1
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - mountWithArgs: mountWithConn: Mount: mount: running /usr/bin/fusermount: exit status 1

I already verified that Airflow can access the bucket by running the following command and I see the list of files in the bucket:

gsutil ls gs://$BUCKET

I even tried running the following command and I still get same error as above:

sudo mount -t gcsfuse -o rw,user $BUCKET $MOUNT_DIR

I have referenced the following and a few other pages but I am still not able to mount it:

Update: I updated the composer environment to composer-2.4.2-airflow-2.5.3 and I still see the following error:

[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"INFO","severity":"INFO","message":"Start gcsfuse/1.0.1 (Go version go1.20.5) for app \"\" using mount point:/home/airflow/gcs/data/my-bucket\n","timestampSeconds":1695254138,"timestampNanos":83062812}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"INFO","severity":"INFO","message":"Opening GCS connection...\n","timestampSeconds":1695254138,"timestampNanos":83799366}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"INFO","severity":"INFO","message":"Creating a mount at \"/home/airflow/gcs/data/datavant/my-bucket\"\n","timestampSeconds":1695254138,"timestampNanos":87562370}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"INFO","severity":"INFO","message":"Creating a new server...\n","timestampSeconds":1695254138,"timestampNanos":87589651}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"INFO","severity":"INFO","message":"Set up root directory for bucket my-bucket\n","timestampSeconds":1695254138,"timestampNanos":87599362}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"gcs: Req              0x0: \u003c- ListObjects(\"\")\n","timestampSeconds":1695254138,"timestampNanos":87612220}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"gcs: Req              0x0: -\u003e ListObjects(\"\") (106.665835ms): OK\n","timestampSeconds":1695254138,"timestampNanos":194287578}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"INFO","severity":"INFO","message":"Mounting file system \"my-bucket\"...\n","timestampSeconds":1695254138,"timestampNanos":194342795}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"fuse_debug: Beginning the mounting kickoff process\n","timestampSeconds":1695254138,"timestampNanos":194916407}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"fuse_debug: Parsing fuse file descriptor\n","timestampSeconds":1695254138,"timestampNanos":194977401}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"fuse_debug: Preparing for direct mounting\n","timestampSeconds":1695254138,"timestampNanos":194984093}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"fuse_debug: Directmount failed. Trying fallback.\n","timestampSeconds":1695254138,"timestampNanos":195003380}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"fuse_debug: Creating a socket pair\n","timestampSeconds":1695254138,"timestampNanos":195238613}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"fuse_debug: Creating files to wrap the sockets\n","timestampSeconds":1695254138,"timestampNanos":195260643}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"fuse_debug: Starting fusermount/os mount\n","timestampSeconds":1695254138,"timestampNanos":195270306}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - /usr/bin/fusermount: fuse device not found, try 'modprobe fuse' first
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"INFO","severity":"INFO","message":"Error while mounting gcsfuse: mountWithConn: Mount: mount: running /usr/bin/fusermount: exit status 1\n","timestampSeconds":1695254138,"timestampNanos":198067902}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - mountWithArgs: mountWithConn: Mount: mount: running /usr/bin/fusermount: exit status 1

Solution

  • It is not possible to mount another bucket in Google Cloud Composer's Airflow environment. Confirmed this with Google support.

    So workout for this was to copy the files I needed to the bucket where all the Airflow data (DAGS etcs) are and use that as the local filesystem.