Search code examples
pythonpython-xarray

Add some values to one existing coordinate in Array Dataarray


Let's say we open one NetCDF file and get a DataArray da like

<xarray.DataArray (x: 2, y: 3)>
array([[0.50793919, 0.49505336, 0.19573345],
       [0.7830897 , 0.82954952, 0.19427877]])
Coordinates:
  * x        (x) int64 0 1
  * y        (y) int64 0 1 2

Now, our target DataArray da_new looks like

<xarray.DataArray (x: 4, y: 3)>
array([[0.50793919, 0.49505336, 0.19573345],
       [0.7830897 , 0.82954952, 0.19427877],
       [       nan,        nan,        nan],
       [       nan,        nan,        nan]])
Coordinates:
  * x        (x) int64 0 1 2 3
  * y        (y) int64 0 1 2

To reach our target, we can construct one new DataArray and refill it using the da data, something like

da_new = xr.DataArray(
    data = np.full([4,3], fill_value=np.nan),
    dims = ['x','y'],
    coords=dict(
        x = range(4),
        y = range(3)
    )
)
da_new.loc[0:1,:] = da

However, in my side, this method is a little bit tiring, especially when there are many dimensions of the DataArray.

So, I'm wondering is there any simple and explicit method to do this. Many thanks.


Solution

  • The sequence of steps is sound (create new placeholder array, then copy data into it), but we can use the dimensions from the original so we do not have to hard code dimensions and coordinates for the new array if they are the same as the original.

    Make some data

    import numpy as np
    import xarray as xr
    
    # Toy data.
    ar = np.array([
        [0.50793919, 0.49505336, 0.19573345],
        [0.7830897 , 0.82954952, 0.19427877]])
    da = xr.DataArray(
        data = ar,
        dims = ['x','y'],
        coords=dict(
            x = range(2),
            y = range(3)
        ))
    da
    # array([[0.50793919, 0.49505336, 0.19573345],
    #       [0.7830897 , 0.82954952, 0.19427877]])
    # Coordinates:
    #    x   (x) int64 0 1
    #    y   (y) int64 0 1 2
    

    Make bigger placeholder array

    # Map some dimensions to new coordinates of any length.
    new_coords = dict(x=range(4))
    
    # Make empty placeholder array, replacing some coordinates with new ones.
    da_bigger = xr.DataArray(
        dims=da.dims,
        coords=dict(da.coords, **new_coords))
    
    da_bigger
    
    # array([[nan, nan, nan],
    #        [nan, nan, nan],
    #        [nan, nan, nan],
    #        [nan, nan, nan]])
    # Coordinates:
    #    x  (x) int64 0 1 2 3
    #    y  (y) int64 0 1 2
    

    Copy data

    # Copy data from original into corresponding coordinates of bigger array.
    da_bigger.loc[{k: da[k] for k in new_coords}] = da
    
    da_bigger
    
    # array([[0.50793919, 0.49505336, 0.19573345],
    #        [0.7830897 , 0.82954952, 0.19427877],
    #        [       nan,        nan,        nan],
    #        [       nan,        nan,        nan]])
    # Coordinates:
    #     x   (x) int64 0 1 2 3
    #     y   (y) int64 0 1 2
    

    Assumption: The new coordinates are a superset of the original.

    EDIT: Use xr.align

    We can get xr.align to do the heavy lifting for the copy operation. In this example the empty placeholder array is passed directly into align without storing it in a named variable.

    da_big2, _ = xr.align(
        da,
        xr.DataArray(
            dims=da.dims,
            coords=dict(da.coords, **new_coords)),
        join="outer")
    da_big2
    # array([[0.50793919, 0.49505336, 0.19573345],
    #        [0.7830897 , 0.82954952, 0.19427877],
    #        [       nan,        nan,        nan],
    #        [       nan,        nan,        nan]])
    # Coordinates:
    #     x   (x) int64 0 1 2 3
    #     y   (y) int64 0 1 2