Search code examples
pythonpandasdataframepython-xarraynetcdf4

I want to select data using ranges of longitudes and latitudes in a NetCDF4 file using Python on Windows. I can't even open the dataset with xarray


I hoped that converting it into a DataFrame would help. I had planned to use the between function on the resulting dataframe. But the conversion itself does not work, even with xarray. I have another reason to want to convert it into a DataFrame: using Panda statsmodels.

Since I am on Windows, I can't use nctoolkit.

#imports
import requests
import netCDF4 as nc
import gzip
#
# Download the gzipped datafile from GISS and load it into a netCDF Dataset
GISTEMPfile = requests.get('https://data.giss.nasa.gov/pub/gistemp/gistemp250_GHCNv4.nc.gz')
ds = nc.Dataset("dummy_path", mode="r", memory=gzip.decompress(GISTEMPfile.content))
ds.variables
v = ds.variables['tempanomaly']
#
#from pandas import DataFrame
#df = DataFrame(ds)
#
import xarray as xr
temp = xr.open_dataset(ds)
df = temp.to_dataframe()

For some reason, I am going in circles...

Try with 'netcdf4' engine ->

temp = xr.open_dataset(ds, engine='netcdf4')

File "C:\Users\jean-\Desktop\WPy64-31050\python-3.10.5.amd64\lib\site-packages\xarray\backends\api.py", line 495, in open_dataset backend_ds = backend.open_dataset(

File "C:\Users\jean-\Desktop\WPy64-31050\python-3.10.5.amd64\lib\site-packages\xarray\backends\netCDF4_.py", line 553, in open_dataset store = NetCDF4DataStore.open(

File "C:\Users\jean-\Desktop\WPy64-31050\python-3.10.5.amd64\lib\site-packages\xarray\backends\netCDF4_.py", line 355, in open raise ValueError(

ValueError: can only read bytes or file-like objects with engine='scipy' or 'h5netcdf'

Try with 'h5netcdf' engine ->

temp = xr.open_dataset(ds, engine='h5netcdf')

File "C:\Users\jean-\Desktop\WPy64-31050\python-3.10.5.amd64\lib\site-packages\xarray\backends\api.py", line 481, in open_dataset backend = plugins.get_backend(engine)

File "C:\Users\jean-\Desktop\WPy64-31050\python-3.10.5.amd64\lib\site-packages\xarray\backends\plugins.py", line 156, in get_backend raise ValueError(

ValueError: unrecognized engine h5netcdf must be one of: ['netcdf4', 'scipy', 'store', 'zarr']

I tried again without specifying the engine after upgrading to WinPython 3.11.3.1, to no avail ->

temp = xr.open_dataset(ds) Traceback (most recent call last):

Cell In[3], line 1 temp = xr.open_dataset(ds)

File ~\Desktop\WPy64-31131\python-3.11.3.amd64\Lib\site-packages\xarray\backends\api.py:509 in open_dataset engine = plugins.guess_engine(filename_or_obj)

File ~\Desktop\WPy64-31131\python-3.11.3.amd64\Lib\site-packages\xarray\backends\plugins.py:197 in guess_engine raise ValueError(error_msg)

ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'h5netcdf', 'scipy', 'zarr']. Consider explicitly selecting one of the installed engines via the engine parameter, or installing additional IO dependencies

Are there dependencies that I should be aware of?


Solution

  • Xarray supports range based selection using the sel method and the slice object. For example:

    ds = xr.open_dataset(...)
    ds_region = ds.sel(lon=slice(-85.0, -80.1), lat=slice(-10.0, 10.0))