Search code examples
python-3.xgoogle-colaboratorypython-s3fs

Downloading S3 files in Google Colab


I am working on a project and it happens that some data is provided in form of S3fileSystem. I can read that data using S3FileSystem.open(path). But there are more than 360 files and it takes atleast 3 minutes to read a single file. I was wondering, is there any way of downloading these files in my system and read them from there, instead of reading it directly from S3fileSystem. There is another reason, although I can read all those files but once my session on colab reconnects I have to re-read all those files again, hence it will take a lot of time. I am using following code to read files

fs_s3 = s3fs.S3FileSystem(anon=True)
s3path = 'file_name'
remote_file_obj = fs_s3.open(s3path, mode='rb')
ds = xr.open_dataset(remote_file_obj, engine= 'h5netcdf')

Is there any way of downloading those files?


Solution

  • You can use another s3fs to mount the bucket, then copy the files to Colab.

    how to mount

    After mounting, you can

    !cp /s3/yourfile.zip /content/