I have a netCDF4 dataset for the following datatime which is stored in _date_times
variable:-
<xarray.DataArray 'Time' (Time: 21)> Size: 168B
array(['2025-01-30T00:00:00.000000000', '2025-01-30T06:00:00.000000000',
'2025-01-30T12:00:00.000000000', '2025-01-30T18:00:00.000000000',
'2025-01-31T00:00:00.000000000', '2025-01-31T06:00:00.000000000',
'2025-01-31T12:00:00.000000000', '2025-01-31T18:00:00.000000000',
'2025-02-01T00:00:00.000000000', '2025-02-01T06:00:00.000000000',
'2025-02-01T12:00:00.000000000', '2025-02-01T18:00:00.000000000',
'2025-02-02T00:00:00.000000000', '2025-02-02T06:00:00.000000000',
'2025-02-02T12:00:00.000000000', '2025-02-02T18:00:00.000000000',
'2025-02-03T00:00:00.000000000', '2025-02-03T06:00:00.000000000',
'2025-02-03T12:00:00.000000000', '2025-02-03T18:00:00.000000000',
'2025-02-04T00:00:00.000000000'], dtype='datetime64[ns]')
The above data is of six hour interval. However, I need to convert the dataset to twelve hourly dataset. The filtered dataset should look like this:-
<xarray.DataArray 'Time' (Time: 21)> Size: 168B
array(['2025-01-30T00:00:00.000000000', '2025-01-30T12:00:00.000000000',
'2025-01-31T00:00:00.000000000', '2025-01-31T12:00:00.000000000',
'2025-02-01T00:00:00.000000000', '2025-02-01T12:00:00.000000000',
'2025-02-02T00:00:00.000000000', '2025-02-02T12:00:00.000000000',
'2025-02-03T00:00:00.000000000', '2025-02-03T12:00:00.000000000',
'2025-02-04T00:00:00.000000000'], dtype='datetime64[ns]')
What I tried was:-
xr_ds.sel(Time=slice(_date_times[0], _date_times[-1]), freq='12 h')
Off course, it won't work as there is no option to specify freq
.
How do I slice dataset containing only on specific time interval?
You don't have to use a slice()
to select the times, you can also specify a list or array of times. Here, I used Pandas date_range()
for simplicity:
import xarray as xr
import pandas as pd
import numpy as np
ds = xr.open_dataset('202001.nc')
times = pd.date_range(ds.time[0].values, ds.time[-1].values, freq='12h')
dst = ds.sel(time=times)
This results in:
In [10]: dst.time
Out[10]:
<xarray.DataArray 'time' (time: 62)> Size: 496B
array(['2020-01-01T00:00:00.000000000', '2020-01-01T12:00:00.000000000',
'2020-01-02T00:00:00.000000000', '2020-01-02T12:00:00.000000000',
An alternative is to use ds.isel()
with an array of indexes.
dst = ds.isel(time=np.arange(0, ds.time.size, 12))
Or you can simply slice the time array from the original dataset, if you really want to avoid Pandas/Numpy:
dst = ds.sel(time=ds.time[::12])