Search code examples
pythonpandasnumpydatetimepython-xarray

Select xarray dataset based on month


I have xarray dataset with following info:

Coordinates:
lat: float64 (192)
lon: float64 (288)
time: object (1200) (monthly data)

Data Variables:
tas: (time, lat, lon)

Now I want values of tas for specific month, for example I want new dataset with all records of month January.

Output dataset will look like this:

Coordinates:
lat: float64 (192)
lon: float64 (288)
time: object (100) (monthly data of January)

Data Variables:
tas: (time, lat, lon)

I tried a way like this which I used before:

jan = pd.date_range(start='1979-01-01', periods=41, freq='AS-JAN').date.tolist()
gs_jan = gs.sel(time = jan)

But this won't work in my case cause dates for me is in 0001-0100 year, and pandas doesn't support date in that range!


Solution

  • Generally for analysing time-series data like this, you want to follow the group-split-apply approach using xarray's da.groupby() method (http://xarray.pydata.org/en/stable/groupby.html).

    In your case, I'd suggest trying:

    # Use .groupby('time.month') to organize the data into months
    # then use .groups to extract the indices for each month
    month_idxs=gs.groupby('time.month').groups
    
    # Extract the time indices corresponding to all the Januarys 
    jan_idxs=month_idxs[1]
    
    # Extract the january months by selecting 
    # the relevant indices
    gs_jan=gs.isel(time=jan_idxs)
    

    Hope this helps!