Search code examples
pythonasynchronousamazon-s3bigdataminio

How to read zarr files correctly from minio?


I want to read a big zarr file from my minio(s3) server,however,after I changed three methods,they are all crashed:

import hydrodata.configs.config as conf

# Method 1
# https://pastebin.com/vkM1M3VV
zarr_path = await conf.FS.open_async('s3://datasets-origin/usgs_streamflow_nldas_hourly.zarr')
zds = xr.open_dataset(zarr_path, engine='zarr')
# Method 2
# https://pastebin.com/fKKECf3U
zarr_path = conf.FS.get_mapper('s3://datasets-origin/usgs_streamflow_nldas_hourly.zarr')
wrapped_store = zarr.storage.KVStore(zarr_path)
zds = xr.open_zarr(wrapped_store)
# Method 3
# AttributeError: __enter__
with conf.FS.open_async('s3://datasets-origin/usgs_streamflow_nldas_hourly.zarr') as zarr_path:
zds = xr.open_dataset(zarr_path)

And this is conf.FS:

FS = s3fs.S3FileSystem(
    client_kwargs={"endpoint_url": MINIO_PARAM["endpoint_url"]},
    key=MINIO_PARAM["key"],
    secret=MINIO_PARAM["secret"],
    use_ssl=False,
)

So how to solve their problem and let me get correct data?

———————————————————————————————————

This is my crash report in Method2:

name = 'xarray.core.daskmanager'
import_ = <function _gcd_import at 0x7fe2aabbb400>
 
>   ???
E   ModuleNotFoundError: No module named 'xarray.core.daskmanager'

However I have run pip install xarray[complete] and conda install -c conda-forge xarray dask netCDF4 bottleneck before,so where's the problem? This is my pip list: https://pastebin.com/BUbcNqtT


Solution

  • I have redone this practice on another computer, at last I found this problem can't be reproduced. Now I should close the problem.