Search code examples
netcdfpython-datetimenetcdf4python-xarray

Python xarray's open_mfdataset works in an unexpected way


I have 365 daily netCDF files for the year 1980. These files are located in a folder that has data from multiple years (1979-2013).

When I open the 1980 files using,

files  = glob.glob("/mnt/nfs/home/solomon/Data/CFSR/NetCDFs_1979-2013/Subset/data_1980*")
ds = xarray.open_mfdataset(files, engine="netcdf4")

The time stamp seems to be incorrect. When I print out the time, I get the following:

ds.time.sortby("time")
Out[28]: 
<xarray.DataArray 'time' (time: 3286)>
array(['1979-01-07T00:00:00.000000000', '1979-01-07T03:00:00.000000000',
       '1979-01-07T06:00:00.000000000', ..., '2013-12-23T18:00:00.000000000',
       '2013-12-23T21:00:00.000000000', '2013-12-24T00:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 1979-01-07 1979-01-07T03:00:00 ...
Attributes:
    standard_name:  time
    axis:           T

To check if the other files in the folder are being read, I changed the contents of the folder (i.e. I removed the 2012 files), but I still get the same time series as before. I am not sure what is wrong!

Out[29]: 
<xarray.DataArray 'time' (time: 3286)>
array(['1979-01-07T00:00:00.000000000', '1979-01-07T03:00:00.000000000',
       '1979-01-07T06:00:00.000000000', ..., '2013-12-23T18:00:00.000000000',
       '2013-12-23T21:00:00.000000000', '2013-12-24T00:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 1979-01-07 1979-01-07T03:00:00 ...
Attributes:
    standard_name:  time
    axis:           T

The NetCDF data has the metadata as follows (using ncdump -h):

svimal@lettenmaierlab06:/mnt/nfs/home/solomon/Data/CFSR/NetCDFs_1979-2013/Subset$ ncdump -h data_19800530.nc
netcdf data_19800530 {
dimensions:
    lon = 503 ;
    lat = 170 ;
    time = UNLIMITED ; // (1 currently)
variables:
    double lon(lon) ;
        lon:standard_name = "longitude" ;
        lon:long_name = "longitude" ;
        lon:units = "degrees_east" ;
        lon:axis = "X" ;
    double lat(lat) ;
        lat:standard_name = "latitude" ;
        lat:long_name = "latitude" ;
        lat:units = "degrees_north" ;
        lat:axis = "Y" ;
    double time(time) ;
        time:standard_name = "time" ;
        time:units = "hours since 1999-5-16 00:00:00" ;
        time:calendar = "standard" ;
        time:axis = "T" ;
    float air_temp(time, lat, lon) ;
        air_temp:long_name = "air temperuature (C)" ;
        air_temp:_FillValue = -9.99e+08f ;
        air_temp:missing_value = -9.99e+08f ;
    float vp(time, lat, lon) ;
        vp:long_name = "vapor pressure (kPa)" ;
        vp:_FillValue = -9.99e+08f ;
        vp:missing_value = -9.99e+08f ;
    float pressure(time, lat, lon) ;
        pressure:long_name = "pressure (kPa)" ;
        pressure:_FillValue = -9.99e+08f ;
        pressure:missing_value = -9.99e+08f ;
    float windspd(time, lat, lon) ;
        windspd:long_name = "wind (m/s)" ;
        windspd:_FillValue = -9.99e+08f ;
        windspd:missing_value = -9.99e+08f ;
    float shortwave(time, lat, lon) ;
        shortwave:long_name = "downward shortwave (W/m^2)" ;
        shortwave:_FillValue = -9.99e+08f ;
        shortwave:missing_value = -9.99e+08f ;
    float longwave(time, lat, lon) ;
        longwave:long_name = "downward longwave (W/m^2)" ;
        longwave:_FillValue = -9.99e+08f ;
        longwave:missing_value = -9.99e+08f ;
    float precip(time, lat, lon) ;
        precip:long_name = "precipitation (mm/hr)" ;
        precip:_FillValue = -9.99e+08f ;
        precip:missing_value = -9.99e+08f ;

// global attributes:
        :CDI = "Climate Data Interface version ?? (http://mpimet.mpg.de/cdi)" ;
        :Conventions = "CF-1.4" ;
        :history = "Tue Mar 20 14:36:48 2018: ncea -d lat,41.375,83.625 -d lon,181.375,306.875 /mnt/nfs/home/solomon/Data/CFSR/NetCDFs_1979-2013/data_19800530.nc /mnt/nfs/home/solomon/Data/CFSR/NetCDFs_1979-2013/Subset/data_19800530.nc\n",
            "Tue Mar 20 14:36:44 2018: cdo -f nc import_binary /mnt/nfs/home/solomon/Data/CFSR/CFSR-LAND_Global_0.25deg_data_changed.ctl /mnt/nfs/home/solomon/Data/CFSR/NetCDFs_1979-2013/data_19800530.nc" ;
        :CDO = "Climate Data Operators version 1.7.0 (http://mpimet.mpg.de/cdo)" ;
        :NCO = "\"4.5.4\"" ;
        :nco_openmp_thread_number = 1 ;
}

The time attribute says

time = UNLIMITED ; // (1 currently)

I am not sure what this means, could this be the issue?


Solution

  • Are you sure that your use of glob.glob() is only returning netCDF files with times in 1980?

    I would suggest a spot-check with an explicit loop (skipping open_mfdataset):

    files = glob.glob("/mnt/nfs/home/solomon/Data/CFSR/NetCDFs_1979-2013/Subset/data_1980*")
    for path in files:
        ds = xarray.open_dataset(path, engine="netcdf4")
        print(path, ds.time.values)
    

    Side note: it's best to pass the glob string directly into open_mfdataset() rather than explicitly calling glob.glob(). It is a little more succinct, and xarray also calls sorted() on glob strings that it parses, rather than relying on the platform specific order of the list returned by glob.glob().