I have more than 200 time-series variables in a .csv
file and want to write all variables in a netCDF file. But I don't know why I can't able to do that in a loop. The code with pseudo data are given below:
vars = ['one', 'two', 'three', 'four']
date = pd.date_range(start='2021-01-01', end='2021-01-12')
data_dict = {k: np.random.rand(12) for k in vars}
data = pd.DataFrame(data_dict, index=date)
try:
# just to be safe, make sure dataset is not already open.
ncfile.close()
except:
pass
ncfile = Dataset('test.nc', mode='w', format='NETCDF4_CLASSIC')
lat_dim = ncfile.createDimension('lat', 1)
lon_dim = ncfile.createDimension('lon', 1)
time_dim = ncfile.createDimension('time', None)
time = ncfile.createVariable('time', np.float64, ('time',))
time.units = 'Minutes since 2021-01-01 0'
time.long_name = 'time'
calendar = 'standard'
time[:] = date2num(
(pd.to_datetime(data.index)).to_pydatetime(),
units=time.units,
calendar=calendar
)
for i, vname in enumerate(var):
vname = ncfile.createVariable(vname,np.float64,'time')
vname[:] = data[vname].values
I think the problem here is 'vname' which is a string. I tried to convert it into an object but am not able to do that. I am not sure if I am wrong.
using xarray
you could write something like :
import numpy as np
import pandas as pd
variables = ['one', 'two', 'three', 'four']
date = pd.date_range(start='2021-01-01', end='2021-01-12')
data_dict = {k: np.random.rand(12) for k in variables}
data = pd.DataFrame(data_dict, index=date)
data.index.name = 'time'
ds = data.to_xarray().expand_dims(dim=['lat', 'lon'])
ds.to_netcdf('test.nc', format='NETCDF4')
The to_xarray
method will give you a xarray.Dataset
with time
as a coordinate and your four variables. And the expand_dims
adds the two spatial dimensions.
Tell me if the resulting dataset/netcdf is not exaclty like you want.