Quite a simple question but I cannot seem to find a solution.
I have time dimension, years t1 to t2. Then I have a few variables, lets say [x, y, z] and all of them have error values associated with them. The current CSV file is like this:
year | x | x_err | y | y_err |
---|---|---|---|---|
1 | n | n | n | n |
2 | n | n | n | n |
3 | n | n | n | n |
I'd like to make a NetCDF file where there would be two variables [x_variabel] & [y_variable], which would be two dimensional (as in they would include the value, and the error value).
I know how to make a simple DataFrame and turn that into a NetCDF i.e.
import xarray as xr
Data = pd.DataFrame({'Year': Years,
'x': x,
'x_err': x_err
'y': y,
'y_err':y_err
})
Data = xr.Dataset.from_dataframe(Data.set_index(["Year"]))
Data.to_netcdf("NetCDF.nc")
This creates a standard NetCDF where year is the dimension and I have four variables.
But how can I make a NetCDF where instead of "x" and "x_err" being different variables, they would be a single variable, where "x" would be x_variable[0] and "x_err" would be x_variable[1]? This would be a significant improvement to my workflow.
Data = pd.DataFrame({'Year': Years,
'x': [x,x_err],
'y': [y,y_err]
})
Does not work, since it considers 'x' and 'y' to have lengths of just two. Is it possible to be done?
You can do that by stacking the array. As an example for x:
import pandas as pd
test = {"x":{"1":3,"2":4,"3":5},"x_err":{"1":0.3,"2":0.4,"3":0.5},"y":{"1":30,"2":40,"3":50},"y_err":{"1":300,"2":400,"3":500}}
df = pd.DataFrame(test)
data = xr.Dataset.from_dataframe(df)
data[["x", "x_err"]].to_stacked_array("x", ["index"])
>>> <xarray.DataArray 'x' (index: 3, x: 2)>
array([[3. , 0.3],
[4. , 0.4],
[5. , 0.5]])
Coordinates:
* index (index) object '1' '2' '3'
* x (x) object MultiIndex
* variable (x) object 'x' 'x_err'
and then you can index like this:
test = data[["x", "x_err"]].to_stacked_array("x", ["index"])
test[0,1]
>>> <xarray.DataArray 'x' ()>
array(0.3)
Coordinates:
index <U1 '1'
x object ('x_err',)
variable <U5 'x_err'