Search code examples
subsetslicenetcdfpython-xarray

Python xarray: Processing data for a loop with method='nearest' at different locations


Is it possible to have an xarray with multiple columns all having the same coordinates? In The following example I create an xarray and then I want to extract time series data at different locations. However, to do this I have to create a numpy array to store this data and its coordinates.

#Sample from the data in the netCDF file 
ds['temp'] = xr.DataArray(data=np.random.rand(2,3,4), dims=['time','lat','lon'], 
             coords=dict(time=pd.date_range('1900-1-1',periods=2,freq='D'), 
                         lat=[25.,26.,27.],lon=[-85.,-84.,-83.,-82.]))
display(ds)

#lat and lon locations to extract temp values
locations=np.array([[25.6, -84.7], [26, -83], [26.5, -84.1]])

#Extract time series at different locations
temp=np.empty([ds.shape[0], len(locations)])
lat_lon=np.empty([len(locations),2])

for n in range(locations.shape[0]):
    lat_lon[n,0]=ds.sel(lat=locations[n,0], 
                 lon=locations[n,1], method='nearest').coords['lat'].values
    lat_lon[n,1]=ds.sel(lat=locations[n,0], 
                 lon=locations[n,1], method='nearest').coords['lon'].values
    temp[:,n]=ds.sel(lat=locations[n,0], 
                lon=locations[n,1], method='nearest')

print(temp)
print(lat_lon)

#Find maximum temp for all locations:
temp=temp.max(1)

The output of this code is:

array([[[0.67465371, 0.0710136 , 0.03263631, 0.41050204],
        [0.26447469, 0.46503577, 0.5739435 , 0.33725726],
        [0.20353832, 0.01441925, 0.26728572, 0.70531547]],

       [[0.75418953, 0.20321738, 0.41129902, 0.96464691],
        [0.53046103, 0.88559914, 0.20876142, 0.98030988],
        [0.48009467, 0.7906767 , 0.09548439, 0.61088112]]])
Coordinates:
time (time) datetime64[ns] 1900-01-01 1900-01-02
lat (lat) float64 25.0 26.0 27.0
lon (lon) float64 -85.0 -84.0 -83.0 -82.0
temp (time, lat, lon) float64 0.09061 0.6634 ... 0.5696 0.4438
Attributes: (0)


[[0.26447469 0.5739435  0.01441925]
 [0.53046103 0.20876142 0.7906767 ]]
[[ 26. -85.]
 [ 26. -83.]
 [ 27. -84.]]

More simply, is there a way to find the maximum temp across all locations for every timestamp without creating the intermediate temp array?


Solution

  • When you create the sample data, you specify 3 values of latitude and 4 values of longitude. That means 12 values in total, on a 2D grid (3D if we add time).

    When you want to query values for 3 specific points, you have to query each point individually. As far as I know, there are two ways to do that:

    • Write a loop and store the result on an intermediate array (your solution)
    • Stack dimensions and query longitude and latitude simultaneously.

    First, you have to express your locations as a list/array of tuples:

    locations=np.array([[25.6, -84.7], [26, -83], [26.5, -84.1]])
    coords=[(coord[0], coord[1]) for coord in locations]
    print(coords)
    
    [(25.6, -84.7), (26.0, -83.0), (26.5, -84.1)]
    

    Then you interpolate your data for the specified locations, stack latitude and longitude to a new dimension coord, select your points.

    (ds
     .interp(lon=locations[:,1], lat=locations[:,0], method='linear') # interpolate on the grid
     .stack(coord=['lat','lon']) # from 3x3 grid to list of 9 points
     .sel(coord=coords)) # select your three points
     .temp.max(dim='coord') # get largest temp value from the coord dimension
    )
    
    array([0.81316195, 0.56967184]) # your largest values at both timestamps 
    

    The downside is that xarray doesn't support interpolation for unlabeled multi-index, which is why first you need to interpolate (NOT simply find the nearest neighbor) the grid on your set of latitudes and longitudes.