Search code examples
pythonpython-xarraynetcdf

Merge/flattening a list of xarray.Dataset


I looped over several multidimensional NetCDF files to extract a variable of interest using xarray functions and stored the outputs as a list of xarray.core.dataset.Dataset. I'm not familiar with xarray or this data format. I need to merge or flatten the list into a single xarray-readable NetCDF file or Dataset format. Any help would be appreciated. Here's my code:

import glob
import xarray as xr

all_fwi = glob.glob("D:/FWI_future/fwi_intermediary/fwi_day_CanESM5_ssp245*.nc")


fwi_list = []
for i in all_fwi:
    infile = xr.open_dataset(i, drop_variables=['TEMP_wDC_2014','ffmcPREV_2014', 'dcPREV_2014', 'dmcPREV_2014','SeasonActive_2014', 'DCf_2014', 'rw_2014', 'CounterSeasonActive_2014','ffmc', 'dc','dmc', 'isi', 'bui', 'TEMP', 'RH', 'RAIN', 'WIND'])
    yt = infile.sel(lon=slice(218.9931, 236.2107), lat=slice(60, 69.64794))
    yt1 = yt.sel(time=slice('2020-01-01', '2060-12-31'))
    yt2 = yt1.sel(time=yt1.time.dt.month.isin([3, 4, 5, 6, 7, 8, 9]))
    fwi_list.extend(yt2)

Here's the yt2 data structure:

yt2
<xarray.Dataset> Size: 1MB
Dimensions:    (time: 8774, lat: 3, lon: 6, bnds: 2, days_wDC: 5)
Coordinates:
  * time       (time) object 70kB 2020-03-01 12:00:00 ... 2060-09-30 12:00:00
  * lat        (lat) float64 24B 62.79 65.58 68.37
  * lon        (lon) float64 48B 219.4 222.2 225.0 227.8 230.6 233.4
  * days_wDC   (days_wDC) <U5 100B 'day-2' 'day-1' 'day' 'day+1' 'day+2'
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) object 140kB ...
    lat_bnds   (lat, bnds) float64 48B ...
    lon_bnds   (lon, bnds) float64 96B ...
    fwi        (time, lat, lon) float64 1MB ...

rioxarray.merge function would have been ideal but it seems now deprecated with no replacement. I have tried to use Dataset in NETCDF4 but getting an error message:

fin_dat = netCDF4.Dataset(fwi_list, 'w', format='NETCDF4')

ermissionError: [Errno 13] Permission denied: "['time_bnds', 'lat_bnds', 'lon_bnds', 'fwi'...


Solution

  • Figured it out. One needs to substitute extend with append to generate the list and then concatenate the list using xarrary.concat. Specifically:

    fwi_list = []
    for i in all_fwi:
        infile = xr.open_dataset(i, drop_variables=['TEMP_wDC_2014','ffmcPREV_2014', 'dcPREV_2014', 'dmcPREV_2014','SeasonActive_2014', 'DCf_2014', 'rw_2014', 'CounterSeasonActive_2014','ffmc', 'dc','dmc', 'isi', 'bui', 'TEMP', 'RH', 'RAIN', 'WIND'])
        yt = infile.sel(lon=slice(218.9931, 236.2107), lat=slice(60, 69.64794))
        yt1 = yt.sel(time=slice('2020-01-01', '2060-12-31'))
        yt2 = yt1.sel(time=yt1.time.dt.month.isin([3, 4, 5, 6, 7, 8, 9]))
        fwi_list.append(yt2) # switch extend with append here
    
    final_dat = xr.concat(fwi_list, dim = 'time') #careful not to concatenate inside the loop, it would prolong the run.