Search code examples
pythonmergepython-xarraynetcdf4

How to use XArray to merge specific netcdf4


Background

I have 2 years worth of netcdf4 files (1 netcdf4 file per day). I have been using X-Array to merge files making them easy to use. All netcdf4 files follow same naming convention "YYYYMMDD_data_Nx.nc4.nc"

Question

However what do I do if I only want to use a subset of subset of my data, for example files between 1/1/2019 and 31/1/2019.

What I've currently got

import xarray as xr

ds = xr.open_dataset('C:\\Users\\FILES\\*.nc')
df = ds
df.to_csv('export.csv', index=True)

Solution

  • Solved

    I've looked at the xarray readthedocs page, saw this blurb in the open_mfdataset page.

    paths (str or sequence) – Either a string glob in the form "path/to/my/files/*.nc" or an explicit list of files to open. Paths can be given as strings or as pathlib Paths. If concatenation along more than one dimension is desired, then paths must be a nested list-of-lists (see manual_combine for details). (A string glob will be expanded to a 1-dimensional list.)

    As such I passed through a list

    Updated & Working Code

    import xarray as xr
    from datetime import timedelta, date, datetime
    import pandas as pd
    import numpy as np
    
    
    # **************
    # Date Ranges
    # **************
    def daterange(start_date, end_date):
        for n in range(int((end_date - start_date).days)):
            yield start_date + timedelta(n)
    
    
    # Start & End Date
    start_date = date(2019, 1, 1)
    end_date = date(2019, 1, 31)
    
    # Empty List
    filepath = 'C:\\Users\\USER\\FILES\\'
    filelist = []
    
    # Loop through all MERRA2 files and add the ones we need to the list
    for single_date in daterange(start_date, end_date):
        YYYY = single_date.strftime("%Y")
        MM = single_date.strftime("%m")
        DD = single_date.strftime("%d")
        filename = filepath + YYYY + MM + DD + '_data_Nx.nc'
    
        filelist.append(filename)
    
    # Merge via X-Array and export to csv
    ds = xr.open_mfdataset(filelist, combine='by_coords')
    df = ds.to_dataframe()
    df.to_csv('export.csv', index=True)