I have requested a netCDF subset using Siphon and formed a query to retrieve a variable within a bounding box:
from siphon.catalog import TDSCatalog
cat = TDSCatalog("https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_onedeg/catalog.xml?dataset=grib/NCEP/GFS/Global_onedeg/Best")
ncss = cat.datasets[0].subset()
query = ncss.query()
query.variables("Absolute_vorticity_isobaric")
query.lonlat_box(north=34., south=33., west=-102., east=-101.)
query.accept("netcdf4")
I am seeking a reliable, concise approach to getting the values of that variable's coordinates, specifically time and vertical level. A working, but impractical approach to this would be to request and work with the whole dataset.
import xarray as xr
query.all_times()
data = ncss.get_data(query)
datastore = xr.backends.NetCDF4DataStore(data)
xarray.Dataset
using MetPy's xarray accessords = xr.open_dataset(datastore).metpy.parse_cf()
xarray.DataArray
For each variable of the dataset as an xarray.DataArray
, calling ds.VARIABLE.metpy.DIMENSION
has MetPy automatically return the appropriate coordinate variable (no matter what it is named, e.g. lat
, lon
, time
, time1
, altitude_above_msl
, isobaric3
, height_above_ground1
), where DIMENSION
is one of time
, vertical
, x
, and y
.
In this case, ds.Absolute_vorticity_isobaric.metpy.time
returns ds.time
, and ds.Absolute_vorticity_isobaric.metpy.vertical
returns ds.isobaric2
. Adding .values
to the call returns just the numpy.ndarray
with the values I have been trying to get. So, calling ds.Absolute_vorticity_isobaric.metpy.time.values
produces the following (which is truncated below):
array(['2019-11-17T00:00:00.000000000', '2019-11-17T03:00:00.000000000',
'2019-11-17T06:00:00.000000000', ..., '2020-01-02T06:00:00.000000000',
'2020-01-02T09:00:00.000000000', '2020-01-02T12:00:00.000000000'],
dtype='datetime64[ns]')
Calling ds.Absolute_vorticity_isobaric.metpy.time.values
and ds.Absolute_vorticity_isobaric.metpy.vertical.values
will return just the NumPy arrays, which is what I seek.
While the above does in fact do what I want, it took nearly a minute and a half to run for just one variable, and it (I assume) unnecessarily taxes UCAR servers. Is there any way to get the output above without the massive overhead of loading all of that data itself?
If you are concerned about the performance of your original method, and only wish to extract the time and vertical coordinates, I would recommend using OPENDAP to access your data rather than NCSS. This will simply fetch the metadata at first, and then will lazily load the data that you request (time and vertical coordinates, in your case). Using MetPy v0.11 or newer, an example script using your TDS Catalog of interest would look something like the following:
import metpy
import xarray as xr
from siphon.catalog import TDSCatalog
cat = TDSCatalog("https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_onedeg/catalog.xml?dataset=grib/NCEP/GFS/Global_onedeg/Best")
opendap_url = cat.datasets[0].access_urls['OPENDAP']
ds = xr.open_dataset(opendap_url)
time = ds['Absolute_vorticity_isobaric'].metpy.time.values
vertical = ds['Absolute_vorticity_isobaric'].metpy.vertical.values
print(time)
print(vertical)
This takes roughly a half-second to run on my system.
If you instead have MetPy older than v0.11, you will need to use .metpy.parse_cf()
when opening the dataset, as follows:
ds = xr.open_dataset(opendap_url).metpy.parse_cf()