Search code examples
pythondataframenetcdfpython-xarray

Constructing a NetCDF where variable consists of data and error data


Quite a simple question but I cannot seem to find a solution.

I have time dimension, years t1 to t2. Then I have a few variables, lets say [x, y, z] and all of them have error values associated with them. The current CSV file is like this:

year x x_err y y_err
1 n n n n
2 n n n n
3 n n n n

I'd like to make a NetCDF file where there would be two variables [x_variabel] & [y_variable], which would be two dimensional (as in they would include the value, and the error value).

I know how to make a simple DataFrame and turn that into a NetCDF i.e.

import xarray as xr
Data = pd.DataFrame({'Year': Years,
                     'x': x,
                     'x_err': x_err
                     'y': y,
                     'y_err':y_err
                      }) 
Data = xr.Dataset.from_dataframe(Data.set_index(["Year"]))
Data.to_netcdf("NetCDF.nc")

This creates a standard NetCDF where year is the dimension and I have four variables.

But how can I make a NetCDF where instead of "x" and "x_err" being different variables, they would be a single variable, where "x" would be x_variable[0] and "x_err" would be x_variable[1]? This would be a significant improvement to my workflow.

Data = pd.DataFrame({'Year': Years,
                     'x': [x,x_err],
                     'y': [y,y_err]
                      }) 

Does not work, since it considers 'x' and 'y' to have lengths of just two. Is it possible to be done?


Solution

  • You can do that by stacking the array. As an example for x:

    import pandas as pd
    test = {"x":{"1":3,"2":4,"3":5},"x_err":{"1":0.3,"2":0.4,"3":0.5},"y":{"1":30,"2":40,"3":50},"y_err":{"1":300,"2":400,"3":500}}
    
    df = pd.DataFrame(test)
    
    data = xr.Dataset.from_dataframe(df)
    data[["x", "x_err"]].to_stacked_array("x", ["index"])
    
    >>> <xarray.DataArray 'x' (index: 3, x: 2)>
    array([[3. , 0.3],
           [4. , 0.4],
           [5. , 0.5]])
    Coordinates:
      * index     (index) object '1' '2' '3'
      * x         (x) object MultiIndex
      * variable  (x) object 'x' 'x_err'
    

    and then you can index like this:

    test = data[["x", "x_err"]].to_stacked_array("x", ["index"])
    test[0,1]
    
    >>> <xarray.DataArray 'x' ()>
    array(0.3)
    Coordinates:
        index     <U1 '1'
        x         object ('x_err',)
        variable  <U5 'x_err'