Search code examples
pythonnetcdfopendap

How to use MFDataset to read multiple files in OPeNDAP dataset with Python NetCDF4 module?


I have an opendap thredds link to a directory holding many oceanographic model output files from the Delaware Operational Forecast System (DBOFS). Historical data are stored in separate hourly files and even some files spanning multiple hours. I'd like to look at the files as if they were one long time series. I came across another question asking something similar here: Loop through netcdf files and run calculations - Python or R

Searching with a wildcard character returned the following error:

import netCDF4

f = netCDF4.MFDataset('http://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/DBOFS/MODELS/201401/nos.dbofs.fields.n001.20140130.*.nc')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-a44e21cddbe9> in <module>()
----> 1 f = netCDF4.MFDataset('http://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/DBOFS/MODELS/201401/nos.dbofs.fields.n001.20140130.*.nc')

C:\Users\cenglert\AppData\Local\Enthought\Canopy32\User\lib\site-packages\netCDF4.pyd in netCDF4.MFDataset.__init__ (netCDF4.c:6458)()

ValueError: cannot using file globbing for remote (OPeNDAP) datasets

Solution

  • Like the error says, you can't use globbing (using * for wildcard) on remote datasets, but you can build a python list of dataset URLs and pass them to MFDataset. Like this:

    import netCDF4
    
    base = 'http://opendap.co-ops.nos.noaa.gov/thredds/dodsC/\
    NOAA/DBOFS/MODELS/201401/nos.dbofs.fields.n001.20140130.t%2.2dz.nc'
    files = [base % d for d in range(0,24,6)]
    nc = netCDF4.MFDataset(files)
    print nc.variables['salt']
    

    which produces:

    <class 'netCDF4._Variable'>
    float64 salt('ocean_time', 's_rho', 'eta_rho', 'xi_rho')
        long_name: salinity
        time: ocean_time
        coordinates: lat_rho lon_rho
        field: salinity, scalar, series
    unlimited dimensions = ('ocean_time',)
    current size = (4, 10, 732, 119)
    

    and shows that in fact, the four values at 0,6,12, and 18 hours have been virtually aggregated by MFDataset.