I am trying to do a rolling mean to a dask array within xarray. My issue may lay in the rechunking before the rolling mean. I am getting a ValueError of conflicting sizes between data and coordinates. However, this arises within the rolling operation as I don't think there are conflicts in the data and coords of the array before going into the rolling operation.
Apologies for not creating data to test but my project data is quick to play with:
import xarray as xr
remote_data = xr.open_dataarray('http://iridl.ldeo.columbia.edu/SOURCES/.Models'\
'/.SubX/.RSMAS/.CCSM4/.hindcast/.zg/dods',
chunks={'L': 1, 'S': 1})
da = remote_data.isel(P=0,L=0,M=0,X=0,Y=0)
da_day_clim = da.groupby('S.dayofyear').mean('S')
print(da_day_clim)
#<xarray.DataArray 'zg' (dayofyear: 366)>
#dask.array<shape=(366,), dtype=float32, chunksize=(1,)>
#Coordinates:
# L timedelta64[ns] 12:00:00
# Y float32 -90.0
# M float32 1.0
# X float32 0.0
# P int32 500
# * dayofyear (dayofyear) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
# Do a 31-day rolling mean
# da_day_clim.rolling(dayofyear=31, center=True).mean()
# This brings up:
#ValueError: The overlapping depth 30 is larger than your
#smallest chunk size 1. Rechunk your array
#with a larger chunk size or a chunk size that
#more evenly divides the shape of your array.
# Read http://xarray.pydata.org/en/stable/dask.html
# and found http://xarray.pydata.org/en/stable/generated/xarray.Dataset.chunk.html#xarray.Dataset.chunk
# I could make a little PR to add the .chunk() into the ValeError message. Thoughts?
# Rechunk. Played around with a few values but decided on
# the len of dayofyear
da_day_clim2 = da_day_clim.chunk({'dayofyear': 366})
print(da_day_clim2)
#<xarray.DataArray 'zg' (dayofyear: 366)>
#dask.array<shape=(366,), dtype=float32, chunksize=(366,)>
#Coordinates:
# L timedelta64[ns] 12:00:00
# Y float32 -90.0
# M float32 1.0
# X float32 0.0
# P int32 500
# * dayofyear (dayofyear) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
# Rolling mean on this
da_day_clim_smooth = da_day_clim2.rolling(dayofyear=31, center=True).mean()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-57-6acf382cdd3d> in <module>()
4 da_day_clim = da.groupby('S.dayofyear').mean('S')
5 da_day_clim2 = da_day_clim.chunk({'dayofyear': 366})
----> 6 da_day_clim_smooth = da_day_clim2.rolling(dayofyear=31, center=True).mean()
~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/rolling.py in wrapped_func(self, **kwargs)
307 if self.center:
308 values = values[valid]
--> 309 result = DataArray(values, self.obj.coords)
310
311 return result
~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/dataarray.py in __init__(self, data, coords, dims, name, attrs, encoding, fastpath)
224
225 data = as_compatible_data(data)
--> 226 coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
227 variable = Variable(dims, data, attrs, encoding, fastpath=True)
228
~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/dataarray.py in _infer_coords_and_dims(shape, coords, dims)
79 raise ValueError('conflicting sizes for dimension %r: '
80 'length %s on the data but length %s on '
---> 81 'coordinate %r' % (d, sizes[d], s, k))
82
83 if k in sizes and v.shape != (sizes[k],):
ValueError: conflicting sizes for dimension 'dayofyear': length 351 on the data but length 366 on coordinate 'dayofyear'
The length 351 is related to 366-351=15 (half the window).
This turned out to be a bug in Xarray and was fixed in https://github.com/pydata/xarray/pull/2122
The fix will be in Xarray 0.10.4 which is slated for imminent release.