Search code examples
pythonmultidimensional-arraypointpython-xarrayclip

How to clip multiple points at specific (varying) timesteps in a 3D Array?


I want to clip a multidimensional array with points (shapefile). The points are specific events and have a lat, lon and time value. I lcreate from the columns of the shapefile: lat, lon and time a list (for each), to then in a next step select/clip with these lists the multidimensional array (using the function xarray .sel):

lons = pts.geometry.x.to_list()
lats = pts.geometry.y.to_list()
time = pts.time.to_list()

values_pts = 3D_array.sel(lon=lons, lat=lats, time=time, method="nearest")

With this split of the lat, lon, time to separated lists, they lose their relation to one another, which means that all points are cut out for each timestep and not for the specific dates they occured .. Do you have any ideas how I could clip the lat and lon at the specific timesteps in a 3D array?


Solution

  • Rather than converting x, y, and time to a list, convert them to an xarray.DataArray using pd.Series.to_xarray(). This allows you to make use of xarray's Advanced Indexing mode, where you don't just filter the (lat, lon, time) dimensions you actually reshape the array to conform to the index of the selectors. The following will reshape the lat, lon, and time dimensions to pull the points you request out of the array, maintaining their relationship to each other and setting a new dimension matching the index of your dataframe:

    lons = pts.geometry.x.to_xarray()
    lats = pts.geometry.y.to_xarray()
    time = pts.time.to_xarray()
    
    # because lons, lats, and time all have the same indexing 
    # dim, which is the the index of `pts`, the following will
    # pull the points you're requesting out of the array and 
    # reshape them into a 1-D vector indexed by the common
    # indexing dimension
    values_pts = 3D_array.sel(
        lon=lons, lat=lats, time=time, method="nearest"
    )
    
    # you could now convert this back to a pandas.Series 
    # (if 3D_array is a DataArray) or pandas.DataFrame
    # (if it's a Dataset) if desired:
    values_pts.to_series()