Search code examples
pythonmetpypython-siphon

How do I get arrays of coordinate values for a variable from a netCDF gridded dataset using Siphon and MetPy?


I have requested a netCDF subset using Siphon and formed a query to retrieve a variable within a bounding box:

from siphon.catalog import TDSCatalog
cat = TDSCatalog("https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_onedeg/catalog.xml?dataset=grib/NCEP/GFS/Global_onedeg/Best")
ncss = cat.datasets[0].subset()
query = ncss.query()
query.variables("Absolute_vorticity_isobaric")
query.lonlat_box(north=34., south=33., west=-102., east=-101.)
query.accept("netcdf4")

I am seeking a reliable, concise approach to getting the values of that variable's coordinates, specifically time and vertical level. A working, but impractical approach to this would be to request and work with the whole dataset.

Functional-but-Impractical Approach

Get the data

import xarray as xr
query.all_times()
data = ncss.get_data(query)
datastore = xr.backends.NetCDF4DataStore(data)

Get data as xarray.Dataset using MetPy's xarray accessor

ds = xr.open_dataset(datastore).metpy.parse_cf()

Get the coordinate axes from a constituent xarray.DataArray

For each variable of the dataset as an xarray.DataArray, calling ds.VARIABLE.metpy.DIMENSION has MetPy automatically return the appropriate coordinate variable (no matter what it is named, e.g. lat, lon, time, time1, altitude_above_msl, isobaric3, height_above_ground1), where DIMENSION is one of time, vertical, x, and y.

Get the values

In this case, ds.Absolute_vorticity_isobaric.metpy.time returns ds.time, and ds.Absolute_vorticity_isobaric.metpy.vertical returns ds.isobaric2. Adding .values to the call returns just the numpy.ndarray with the values I have been trying to get. So, calling ds.Absolute_vorticity_isobaric.metpy.time.values produces the following (which is truncated below):

array(['2019-11-17T00:00:00.000000000', '2019-11-17T03:00:00.000000000',
       '2019-11-17T06:00:00.000000000', ..., '2020-01-02T06:00:00.000000000',
       '2020-01-02T09:00:00.000000000', '2020-01-02T12:00:00.000000000'],
      dtype='datetime64[ns]')

Calling ds.Absolute_vorticity_isobaric.metpy.time.values and ds.Absolute_vorticity_isobaric.metpy.vertical.values will return just the NumPy arrays, which is what I seek.

The Problem

While the above does in fact do what I want, it took nearly a minute and a half to run for just one variable, and it (I assume) unnecessarily taxes UCAR servers. Is there any way to get the output above without the massive overhead of loading all of that data itself?


Solution

  • If you are concerned about the performance of your original method, and only wish to extract the time and vertical coordinates, I would recommend using OPENDAP to access your data rather than NCSS. This will simply fetch the metadata at first, and then will lazily load the data that you request (time and vertical coordinates, in your case). Using MetPy v0.11 or newer, an example script using your TDS Catalog of interest would look something like the following:

    import metpy
    import xarray as xr
    
    from siphon.catalog import TDSCatalog
    
    cat = TDSCatalog("https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_onedeg/catalog.xml?dataset=grib/NCEP/GFS/Global_onedeg/Best")
    opendap_url = cat.datasets[0].access_urls['OPENDAP']
    ds = xr.open_dataset(opendap_url)
    
    time = ds['Absolute_vorticity_isobaric'].metpy.time.values
    vertical = ds['Absolute_vorticity_isobaric'].metpy.vertical.values
    
    print(time)
    print(vertical)
    

    This takes roughly a half-second to run on my system.

    If you instead have MetPy older than v0.11, you will need to use .metpy.parse_cf() when opening the dataset, as follows:

    ds = xr.open_dataset(opendap_url).metpy.parse_cf()