Search code examples
pythonpython-xarraynetcdf4

Slicing netCDF4 dataset based on specific time interval using xarray


I have a netCDF4 dataset for the following datatime which is stored in _date_times variable:-

<xarray.DataArray 'Time' (Time: 21)> Size: 168B
array(['2025-01-30T00:00:00.000000000', '2025-01-30T06:00:00.000000000',
       '2025-01-30T12:00:00.000000000', '2025-01-30T18:00:00.000000000',
       '2025-01-31T00:00:00.000000000', '2025-01-31T06:00:00.000000000',
       '2025-01-31T12:00:00.000000000', '2025-01-31T18:00:00.000000000',
       '2025-02-01T00:00:00.000000000', '2025-02-01T06:00:00.000000000',
       '2025-02-01T12:00:00.000000000', '2025-02-01T18:00:00.000000000',
       '2025-02-02T00:00:00.000000000', '2025-02-02T06:00:00.000000000',
       '2025-02-02T12:00:00.000000000', '2025-02-02T18:00:00.000000000',
       '2025-02-03T00:00:00.000000000', '2025-02-03T06:00:00.000000000',
       '2025-02-03T12:00:00.000000000', '2025-02-03T18:00:00.000000000',
       '2025-02-04T00:00:00.000000000'], dtype='datetime64[ns]')

The above data is of six hour interval. However, I need to convert the dataset to twelve hourly dataset. The filtered dataset should look like this:-

<xarray.DataArray 'Time' (Time: 21)> Size: 168B
array(['2025-01-30T00:00:00.000000000', '2025-01-30T12:00:00.000000000', 
       '2025-01-31T00:00:00.000000000', '2025-01-31T12:00:00.000000000', 
       '2025-02-01T00:00:00.000000000', '2025-02-01T12:00:00.000000000',
       '2025-02-02T00:00:00.000000000', '2025-02-02T12:00:00.000000000',
       '2025-02-03T00:00:00.000000000', '2025-02-03T12:00:00.000000000',
       '2025-02-04T00:00:00.000000000'], dtype='datetime64[ns]')

What I tried was:-

xr_ds.sel(Time=slice(_date_times[0], _date_times[-1]), freq='12 h')

Off course, it won't work as there is no option to specify freq.

How do I slice dataset containing only on specific time interval?


Solution

  • You don't have to use a slice() to select the times, you can also specify a list or array of times. Here, I used Pandas date_range() for simplicity:

    import xarray as xr
    import pandas as pd
    import numpy as np
    
    ds = xr.open_dataset('202001.nc')
    times = pd.date_range(ds.time[0].values, ds.time[-1].values, freq='12h')
    dst = ds.sel(time=times)
    

    This results in:

    In [10]: dst.time
    Out[10]: 
    <xarray.DataArray 'time' (time: 62)> Size: 496B
    array(['2020-01-01T00:00:00.000000000', '2020-01-01T12:00:00.000000000',
           '2020-01-02T00:00:00.000000000', '2020-01-02T12:00:00.000000000',
    

    An alternative is to use ds.isel() with an array of indexes.

    dst = ds.isel(time=np.arange(0, ds.time.size, 12))
    

    Or you can simply slice the time array from the original dataset, if you really want to avoid Pandas/Numpy:

    dst = ds.sel(time=ds.time[::12])