Search code examples
amazon-s3netcdfpython-xarraypython-s3fs

Can you use xr.open_mfdataset when reading files from S3 via s3fs?


I'm trying to read multiple netcdf files at once using xr.open_mfdataset from a S3 bucket, using s3fs. Is this possible?

Tried the below, which works for xr.open_dataset for a single file, but doesn't work for multiple files:

import s3fs
import xarray as xr

fs = s3fs.S3FileSystem(anon=False)
s3path = 's3://my-bucket/wind_data*'
store = s3fs.S3Map(root=s3path, s3=s3fs.S3FileSystem(), check=False)

data = xr.open_mfdataset(store, combine='by_coords')

Solution

  • I'm not sure exctly what S3Map does; the documentation from s3fs isn't specific in this.

    However, I was able to create a working implementation of this within a Jupyter environment using S3FileSystem.glob() and S3FileSystem.open()

    Here's a code sample:

    import s3fs
    import xarray as xr
    
    
    s3 = s3fs.S3FileSystem(anon=False)
    
    # This generates a list of strings with filenames
    s3path = 's3://your-bucket/your-folder/file_prefix*'
    remote_files = s3.glob(s3path)
    
    # Iterate through remote_files to create a fileset
    fileset = [s3.open(file) for file in remote_files]
    
    # This works
    data = xr.open_mfdataset(fileset, combine='by_coords')