Search code examples
pythonpandaspython-xarray

How to add new value for a given dimension and coordinate in xarray?


I have data with the following xarray DataSet representation:

df = pd.DataFrame({
    'var': np.random.rand(16),
    'country': 4*['UK'] + 4*['US'] + 4*['FR'] + 4*['DE'],
    'time_delta': 4*list(pd.timedelta_range(
        start='30D',
        periods=4,
        freq='30D'
    )),
})

ds = df.set_index(['country','time_delta']).to_xarray()

I want to add new value of the variable for a given new coord and a given dimension while maintaining the existing dimensions:

Set value=0 of variable=var for coord='0D' of dimension=time_delta while preserving other existing dimensions (in that case country).

In pandas I can do this via:

# 1. Pivot long to wide
df_wide = df.pivot(
    index='country',
    columns='time_delta'
).droplevel(0,axis=1)

# 2. Set value
df_wide['0D'] = 0

# 3. Melt wide to long
df_new = df_wide.melt(ignore_index=False).rename(
    columns={'value': 'var'}
).reset_index()

ds_new = df_new.set_index(['country','time_delta']).to_xarray()

Is there a general way of making such operations in xarray in order to achieve directly ds -> ds_new?

Edit: The below fails with message that: "KeyError: "not all values found in index 'time_delta'"

ds['var'].loc[{'time_delta': '0D'}] = 0

Solution

  • In Xarray, you can achieve the same result using the following approach. Here's the code to perform the operation you described:

    import pandas as pd
    import numpy as np
    import xarray as xr
    
    # Original dataset creation
    df = pd.DataFrame({
        'var': np.random.rand(16),
        'country': 4*['UK'] + 4*['US'] + 4*['FR'] + 4*['DE'],
        'time_delta': 4*list(pd.timedelta_range(
            start='30D',
            periods=4,
            freq='30D'
        )),
    })
    
    ds = df.set_index(['country', 'time_delta']).to_xarray()
    
    # Adding new value for the given coordinate and dimension
    ds_new = ds.copy()  # Make a copy to avoid modifying the original 
    dataset
    
    # Add a new coordinate to the 'time_delta' dimension
    new_time_delta = pd.to_timedelta(['0D'])
    ds_new['var'] = xr.concat([ds_new['var'], xr.DataArray([0], 
    dims='time_delta', coords={'time_delta': new_time_delta})], 
    dim='time_delta')
    
    # Display the new dataset
    print(ds_new)
    

    *This is an edited version of the original answer!

    This code makes use of Xarray's .loc accessor to set the value of the 'var' variable to 0 for the specified coordinate ('0D') in the 'time_delta' dimension. This operation modifies the ds_new dataset while keeping the original ds dataset unchanged.