Search code examples
pythonarraysgeolocationrasterpython-xarray

Create Xarray dataset from grd files and list of grid coordinates


I am trying to use a Canadian gridded historical dataset of temperature anomalies but it seems that I don't have the skills to pull that off. The grd file are temperatures anomalies on what I believe is a highly regular grid. I have no experience with that kind of grid and I am having trouble building the xarray dataset.

What I have (a subset of the grd and the text file is accessible here) :

  • 2075 '.grd' files ('t190001.grd' to 't202112.grd' following "t{year}{month}.grd" structure)
  • 1 txt file listing the grid coordinates called "CANGRD_points_LL.txt"

From this I would like to build a xarray dataset in order to do some analysis.

Naively, I thought the grid files were already georeferenced and all so I started by doing this :

import glob
import rioxarray as rio
import pandas as pd
import numpy as np
import xarray as xr

#not used for the moment even though I believe that will be needed
#df = pd.read_csv(r"CANGRD_points_LL.txt", sep = ' ', header=None)

list_files = sorted(set(glob.glob(r"t?????[0-2].grd" ) + glob.glob(r"t????0[0-9].grd" )))

times = pd.date_range("1900/01/01",freq='M', periods= len(list_files))

datarrays = [rio.open_rasterio(rst, masked=True,band_as_variable=True).assign_coords(time = t).expand_dims(dim='time').squeeze() for rst,t in zip(list_files, times)]
ds = xr.concat(datarrays,dim='time').rename({'band_1' : 'tas', 'y': 'lat', 'x' : 'lon'})

But as I plotted the results it became evident that my coordinates were only the indices of the pixels : enter image description here

So I believe I have to use the txt file provided, however, I have no idea how to make the xarray grid using the grid's coordinates and how to make that match with my array obtained by loading a grid via rioxarray. Here is a sample, the complete file is available above. What baffles me is that most of the 11874 lines of the dataframe resulting from the txt file seem to be unique, so how could I fit an array of dimensions 125 lon by 95 lat into it.

    0   1   2   3
0   0   0   40.0451 -129.8530
1   0   1   40.1780 -129.3650
2   0   2   40.3080 -128.8740
3   0   3   40.4348 -128.3801
4   0   4   40.5585 -127.8834
5   0   5   40.6790 -127.3840
6   0   6   40.7963 -126.8817
7   0   7   40.9104 -126.3768
8   0   8   41.0211 -125.8693
9   0   9   41.1286 -125.3591
10  0   10  41.2327 -124.8465
11  0   11  41.3335 -124.3314
12  0   12  41.4308 -123.8140
13  0   13  41.5247 -123.2942
14  0   14  41.6151 -122.7722
15  0   15  41.7020 -122.2481
16  0   16  41.7853 -121.7218
17  0   17  41.8651 -121.1936
18  0   18  41.9413 -120.6634
19  0   19  42.0139 -120.1313
20  0   20  42.0828 -119.5975
21  0   21  42.1481 -119.0620
22  0   22  42.2097 -118.5249
23  0   23  42.2675 -117.9863
24  0   24  42.3216 -117.4462
25  0   25  42.3720 -116.9049
26  0   26  42.4186 -116.3622
27  0   27  42.4614 -115.8185
28  0   28  42.5005 -115.2736
29  0   29  42.5357 -114.7279
30  0   30  42.5670 -114.1812
31  0   31  42.5946 -113.6338
32  0   32  42.6182 -113.0857
33  0   33  42.6381 -112.5371
34  0   34  42.6540 -111.9880
35  0   35  42.6661 -111.4385
36  0   36  42.6743 -110.8888
37  0   37  42.6786 -110.3389
38  0   38  42.6791 -109.7889
39  0   39  42.6757 -109.2390
40  0   40  42.6684 -108.6892
41  0   41  42.6572 -108.1397
42  0   42  42.6421 -107.5905
43  0   43  42.6232 -107.0417
44  0   44  42.6004 -106.4935
45  0   45  42.5738 -105.9459
46  0   46  42.5433 -105.3991
47  0   47  42.5090 -104.8531
48  0   48  42.4708 -104.3081
49  0   49  42.4289 -103.7640

Here is the view of one grid file loaded as xarray,

enter image description here

Any help would be greatly appreciated! Thank you so much


Solution

  • I directly asked on the Xarray Github discussion here is the original answer from Keewis: https://github.com/pydata/xarray/discussions/7443#discussioncomment-4700261

    The grid file contains stacked 2D coordinates, which I guess is due to the grid's original coordinate system not being aligned with the lat / lon axes.

    To read the coordinates into 2D coordinates you can use:

    df = pd.read_csv(r"CANGRD_points_LL.txt", sep=" ", header=None, names=["y", "x", "lat", "lon"])
    grid = df.set_index(["y", "x"]).to_xarray().set_coords(["lat", "lon"])
    
    raw = xr.concat([...], dim="time")
    ds = xr.merge([raw, grid]).assign_coords(time=times).rename_vars(...)