Search code examples
pythonpandasdataframepandas-groupbypython-xarray

Pandas DF to Xarray Dataset


Hi so initially I had Xarray dataset as follows:

<xarray.Dataset>
Dimensions:    (latitude: 721, longitude: 1400, time: 71)
Coordinates:
  * time       (time) datetime64[ns] 2000-12-31 2001-12-31 ... 2018-12-31
  * longitude  (longitude) float32 -22.5 -21.75 -21.0 -20.25 ... 43.5 44.25 45.0
  * latitude   (latitude) float32 72.0 71.25 70.5 69.75 ... 28.5 27.75 27.0
Data variables:
    tas      (time, latitude, longitude) float64 5.033e+05 ... 1.908e+05

Now I converted it into dataframe and used groupby function on latitude and longitude to get tas value across all time dimension, and this was the sample df which will have 1038239 records(721 * 1440) and tas will have array of 71 values(71 time):

latitude    longitude   tas
-90.0        358.75     [50603.53125, 50002.609375, 50183.98828125, 49...
-90.0        359.00     [50603.53125, 50002.609375, 50183.98828125, 49...
-90.0        359.25     [50603.53125, 50002.609375, 50183.98828125, 49...
-90.0        359.50     [50603.53125, 50002.609375, 50183.98828125, 49...
-90.0        359.75     [50603.53125, 50002.609375, 50183.98828125, 49...

Now I have performed some operation and created new column tas_new with similar size of tas. Now I want to create new dataset or add this variable in old dataset with same dimension (time, latitude, longitude). But I'm not able to reshape it back to old one.

I tried getting all values from tas_new and stack them up like this:

array_tuple = (df_groups['trend'].values)
arrays = np.vstack(array_tuple)

This does return me array of shape (1038239, 71). Can someone guide me how can I get back the original shape and add that variable to xarray dataset or create new one.

Expected Result:

<xarray.Dataset>
Dimensions:    (latitude: 721, longitude: 1400, time: 71)
Coordinates:
  * time       (time) datetime64[ns] 2000-12-31 2001-12-31 ... 2018-12-31
  * longitude  (longitude) float32 -22.5 -21.75 -21.0 -20.25 ... 43.5 44.25 45.0
  * latitude   (latitude) float32 72.0 71.25 70.5 69.75 ... 28.5 27.75 27.0
Data variables:
    tas      (time, latitude, longitude) float64 5.033e+05 ... 1.908e+05
    tas_new  (time, latitude, longitude) float64 5.033e+05 ... 1.908e+05

Or array of dimension (time, latitude, longitude) from dataframe.


Solution

  • So once I had the, arrays = np.vstack(array_tuple) I converted them into the list as whole with shape (1038239*71) and then added that original dataframe corresponding to it's lat-lon-time pair. And then converted the whole dataframe to back to xarray.

    PS: So the dataframe was pretty huge to convert into xarray for low memory(<12GB) system, so I broke the dataframe into 7 parts converted each of them into xarray and then concatenated them to get full Xarray dataset.