Suppose I have the following Xarray dataarray:
>>> da
<xarray.DataArray 'precip' (time: 521, lat: 72, lon: 144)> Size: 22MB
[5401728 values with dtype=float32]
Coordinates:
* lat (lat) float32 288B 88.75 86.25 83.75 81.25 ... -83.75 -86.25 -88.75
* lon (lon) float32 576B 1.25 3.75 6.25 8.75 ... 351.2 353.8 356.2 358.8
* time (time) datetime64[ns] 4kB 1979-01-01 1979-02-01 ... 2022-05-01
which contains monthly mean data from 1979 to 2022. I have calculated the yearly mean data from this data:
>>> yearly_mean=da.resample({'time':'YS'}).mean()
>>> print(yearly_mean)
<xarray.DataArray 'precip' (time: 44, lat: 72, lon: 144)> Size: 2MB [25/1031]
array([[[ 0.5208333 , 0.5141667 , 0.51 , ..., 0.54833335,
0.53833336, 0.5283333 ],
[ 0.4725 , 0.4766667 , 0.4883333 , ..., 0.48499998,
0.47416666, 0.47166666],
[ 0.6399999 , 0.68 , 0.7308333 , ..., 0.5975 ,
0.6016666 , 0.61333334],
...,
[ 0.05666666, 0.05833334, 0.05833334, ..., 0.05416666,
0.05416666, 0.05083333],
[ 0.0725 , 0.07166667, 0.07166667, ..., 0.07416666,
0.0725 , 0.06583334],
[ 0.11333334, 0.11166667, 0.11083334, ..., 0.11583333,
0.115 , 0.10166666]],
[[ 0.5125 , 0.50916666, 0.505 , ..., 0.53416663,
0.525 , 0.5183333 ],
[ 0.43249997, 0.43916664, 0.45000002, ..., 0.4358333 ,
0.4308333 , 0.42999998],
[ 0.5983333 , 0.6266666 , 0.6716667 , ..., 0.5758334 ,
0.5783333 , 0.5841667 ],
...
[ 0.13250001, 0.10916666, 0.09833334, ..., 0.21333332,
0.1875 , 0.16 ],
[ 0.07666666, 0.07583333, 0.075 , ..., 0.0775 ,
0.0775 , 0.07666666],
[ 0.06333333, 0.06333333, 0.06166667, ..., 0.06416667,
0.06333333, 0.06333333]],
[[ 0.35200003, 0.34199998, 0.336 , ..., 0.38599998,
0.374 , 0.36200002],
[ 0.42599997, 0.444 , 0.45999998, ..., 0.394 ,
0.40399998, 0.41399997],
[ 0.49 , 0.528 , 0.582 , ..., 0.43800002,
0.45 , 0.468 ],
...,
[ 1.8119999 , 2.0379999 , 2.35 , ..., 1.436 ,
1.5239999 , 1.6200001 ],
[ 6.03 , 6.708 , 7.484 , ..., 4.4660006 ,
4.9140005 , 5.4300003 ],
[13.785998 , 14.286 , 14.806 , ..., 12.434 ,
12.855998 , 13.309999 ]]], dtype=float32)
Coordinates:
* lat (lat) float32 288B 88.75 86.25 83.75 81.25 ... -83.75 -86.25 -88.75
* lon (lon) float32 576B 1.25 3.75 6.25 8.75 ... 351.2 353.8 356.2 358.8
* time (time) datetime64[ns] 352B 1979-01-01 1980-01-01 ... 2022-01-01
I want to calculate the difference between yearly mean and the monthly means. I could have just done da-yearly_data
, if there were only a single year's data. Since I have data for multiple years, this would not work correctly. Xarray seems to successfully complete the operation but the results are incorrect, even the shape of the resulting dataarray is not what I was expecting:
>>> d=da-yearly_mean
>>> print(d)
<xarray.DataArray 'precip' (time: 44, lat: 72, lon: 144)> Size: 2MB
array([[[-3.10833335e-01, -3.04166734e-01, -3.00000012e-01, ...,
-3.18333328e-01, -3.18333358e-01, -3.08333308e-01],
[-3.52499992e-01, -3.46666694e-01, -3.38333309e-01, ...,
-3.64999980e-01, -3.64166677e-01, -3.61666679e-01],
[-4.19999927e-01, -4.30000007e-01, -4.30833280e-01, ...,
-3.97500038e-01, -4.01666641e-01, -4.13333356e-01],
...,
[ 1.33333392e-02, 2.16666609e-02, 2.16666609e-02, ...,
5.83333522e-03, 5.83333522e-03, 9.16666538e-03],
[-1.24999993e-02, -1.16666667e-02, -1.16666667e-02, ...,
-2.41666622e-02, -2.24999972e-02, -1.58333369e-02],
[-3.33333388e-02, -4.16666716e-02, -4.08333391e-02, ...,
-3.58333364e-02, -3.50000039e-02, -3.16666588e-02]],
[[-1.92499995e-01, -1.99166656e-01, -1.94999993e-01, ...,
-1.84166640e-01, -1.84999973e-01, -1.88333303e-01],
[-1.02499962e-01, -1.19166642e-01, -1.30000025e-01, ...,
-5.58333099e-02, -6.08333051e-02, -7.99999833e-02],
[ 1.66672468e-03, -3.66666317e-02, -7.16666579e-02, ...,
6.41666055e-02, 6.16666675e-02, 3.58332992e-02],
...
-6.33333176e-02, -7.75000006e-02, -6.99999928e-02],
[-7.66666606e-02, -7.58333281e-02, -7.49999955e-02, ...,
-7.75000006e-02, -7.75000006e-02, -7.66666606e-02],
[-6.33333325e-02, -6.33333325e-02, -6.16666675e-02, ...,
-6.41666725e-02, -6.33333325e-02, -6.33333325e-02]],
[[-2.00003386e-03, -1.99997425e-03, -5.99998236e-03, ...,
1.40000284e-02, 5.99998236e-03, 7.99998641e-03],
[ 2.04000026e-01, 2.05999970e-01, 1.89999998e-01, ...,
1.85999990e-01, 1.96000040e-01, 2.06000030e-01],
[ 4.20000017e-01, 4.42000031e-01, 4.37999964e-01, ...,
3.81999969e-01, 4.00000036e-01, 4.12000000e-01],
...,
[ 6.08000159e-01, 1.19200015e+00, 1.97000027e+00, ...,
-3.76000047e-01, -1.43999934e-01, 1.29999876e-01],
[ 1.02699986e+01, 1.19820004e+01, 1.39359999e+01, ...,
6.39399910e+00, 7.50599957e+00, 8.78999996e+00],
[ 2.75340004e+01, 2.88440018e+01, 3.02239990e+01, ...,
2.39559994e+01, 2.50839996e+01, 2.62800007e+01]]],
dtype=float32)
Coordinates:
* lat (lat) float32 288B 88.75 86.25 83.75 81.25 ... -83.75 -86.25 -88.75
* lon (lon) float32 576B 1.25 3.75 6.25 8.75 ... 351.2 353.8 356.2 358.8
* time (time) datetime64[ns] 352B 1979-01-01 1980-01-01 ... 2022-01-01
The shape of the output should be the same as da
(monthly data).
The following is a crude method of doing what I want using a for-loop:
>>> years=yearly_mean.indexes['time'].year
>>> d=da.copy()
>>> for idx, year in enumerate(years):
... d[idx*12:(idx*12)+12]=da[idx*12:(idx*12)+12]-yearly_mean[idx]
>>> print(d)
<xarray.DataArray 'precip' (time: 521, lat: 72, lon: 144)> Size: 22MB
array([[[-3.108333e-01, -3.041667e-01, ..., -3.183334e-01, -3.083333e-01],
[-3.525000e-01, -3.466667e-01, ..., -3.641667e-01, -3.616667e-01],
...,
[-1.250000e-02, -1.166667e-02, ..., -2.250000e-02, -1.583334e-02],
[-3.333334e-02, -4.166667e-02, ..., -3.500000e-02, -3.166666e-02]],
[[-2.208333e-01, -2.141667e-01, ..., -2.283334e-01, -2.283333e-01],
[-2.125000e-01, -2.066667e-01, ..., -2.241667e-01, -2.216667e-01],
...,
[-6.250000e-02, -6.166667e-02, ..., -6.250000e-02, -5.583334e-02],
[-1.033333e-01, -1.016667e-01, ..., -1.050000e-01, -9.166666e-02]],
...,
[[-1.620000e-01, -1.620000e-01, ..., -1.840000e-01, -1.720000e-01],
[-2.860000e-01, -2.940000e-01, ..., -2.740000e-01, -2.840000e-01],
...,
[-2.080000e+00, -2.828000e+00, ..., -8.640003e-01, -1.430000e+00],
[-1.093600e+01, -1.149600e+01, ..., -9.905998e+00, -1.041000e+01]],
[[-2.000034e-03, -1.999974e-03, ..., 5.999982e-03, -2.000004e-03],
[-1.360000e-01, -1.440000e-01, ..., -1.140000e-01, -1.240000e-01],
...,
[-5.680000e+00, -6.368000e+00, ..., -4.564001e+00, -5.080000e+00],
[-1.351600e+01, -1.401600e+01, ..., -1.258600e+01, -1.304000e+01]]],
dtype=float32)
Coordinates:
* lat (lat) float32 288B 88.75 86.25 83.75 81.25 ... -83.75 -86.25 -88.75
* lon (lon) float32 576B 1.25 3.75 6.25 8.75 ... 351.2 353.8 356.2 358.8
* time (time) datetime64[ns] 4kB 1979-01-01 1979-02-01 ... 2022-05-01
Attributes:
long_name: Average Monthly Rate of Precipitation
valid_range: [ 0. 70.]
units: mm/day
precision: 2
var_desc: Precipitation
dataset: CPC Merged Analysis of Precipitation Enhanced
level_desc: Surface
statistic: Mean
parent_stat: Mean
actual_range: [ 0. 144.49]
That is, I want to do subtract from each of the monthly data in da
with the corresponding year's annual mean which is in yearly_mean
.
Is there a way to do it efficiently in Xarray, instead of using a for-loop?
You could try to use Dataset.reindex
to bring your yearly data to the same shape as your monthly data. A possible solution could look like:
ds_diff = da - yearly_mean.reindex(time=da.time, method="ffill")