Is it possible to have an xarray with multiple columns all having the same coordinates? In The following example I create an xarray and then I want to extract time series data at different locations. However, to do this I have to create a numpy array to store this data and its coordinates.
#Sample from the data in the netCDF file
ds['temp'] = xr.DataArray(data=np.random.rand(2,3,4), dims=['time','lat','lon'],
coords=dict(time=pd.date_range('1900-1-1',periods=2,freq='D'),
lat=[25.,26.,27.],lon=[-85.,-84.,-83.,-82.]))
display(ds)
#lat and lon locations to extract temp values
locations=np.array([[25.6, -84.7], [26, -83], [26.5, -84.1]])
#Extract time series at different locations
temp=np.empty([ds.shape[0], len(locations)])
lat_lon=np.empty([len(locations),2])
for n in range(locations.shape[0]):
lat_lon[n,0]=ds.sel(lat=locations[n,0],
lon=locations[n,1], method='nearest').coords['lat'].values
lat_lon[n,1]=ds.sel(lat=locations[n,0],
lon=locations[n,1], method='nearest').coords['lon'].values
temp[:,n]=ds.sel(lat=locations[n,0],
lon=locations[n,1], method='nearest')
print(temp)
print(lat_lon)
#Find maximum temp for all locations:
temp=temp.max(1)
The output of this code is:
array([[[0.67465371, 0.0710136 , 0.03263631, 0.41050204],
[0.26447469, 0.46503577, 0.5739435 , 0.33725726],
[0.20353832, 0.01441925, 0.26728572, 0.70531547]],
[[0.75418953, 0.20321738, 0.41129902, 0.96464691],
[0.53046103, 0.88559914, 0.20876142, 0.98030988],
[0.48009467, 0.7906767 , 0.09548439, 0.61088112]]])
Coordinates:
time (time) datetime64[ns] 1900-01-01 1900-01-02
lat (lat) float64 25.0 26.0 27.0
lon (lon) float64 -85.0 -84.0 -83.0 -82.0
temp (time, lat, lon) float64 0.09061 0.6634 ... 0.5696 0.4438
Attributes: (0)
[[0.26447469 0.5739435 0.01441925]
[0.53046103 0.20876142 0.7906767 ]]
[[ 26. -85.]
[ 26. -83.]
[ 27. -84.]]
More simply, is there a way to find the maximum temp across all locations for every timestamp without creating the intermediate temp
array?
When you create the sample data, you specify 3 values of latitude and 4 values of longitude. That means 12 values in total, on a 2D grid (3D if we add time).
When you want to query values for 3 specific points, you have to query each point individually. As far as I know, there are two ways to do that:
First, you have to express your locations as a list/array of tuples:
locations=np.array([[25.6, -84.7], [26, -83], [26.5, -84.1]])
coords=[(coord[0], coord[1]) for coord in locations]
print(coords)
[(25.6, -84.7), (26.0, -83.0), (26.5, -84.1)]
Then you interpolate your data for the specified locations, stack latitude and longitude to a new dimension coord
, select your points.
(ds
.interp(lon=locations[:,1], lat=locations[:,0], method='linear') # interpolate on the grid
.stack(coord=['lat','lon']) # from 3x3 grid to list of 9 points
.sel(coord=coords)) # select your three points
.temp.max(dim='coord') # get largest temp value from the coord dimension
)
array([0.81316195, 0.56967184]) # your largest values at both timestamps
The downside is that xarray doesn't support interpolation for unlabeled multi-index, which is why first you need to interpolate (NOT simply find the nearest neighbor) the grid on your set of latitudes and longitudes.