Search code examples
python-3.xnetcdfpython-xarrayenumeratenetcdf4

How to write variables in a loop while creating netCDF file in Python


I have more than 200 time-series variables in a .csv file and want to write all variables in a netCDF file. But I don't know why I can't able to do that in a loop. The code with pseudo data are given below:

Data generate

vars = ['one', 'two', 'three', 'four']
date = pd.date_range(start='2021-01-01', end='2021-01-12')
data_dict = {k: np.random.rand(12) for k in vars}
data = pd.DataFrame(data_dict, index=date)

Create netcdf file with dimension

try:
    # just to be safe, make sure dataset is not already open.
    ncfile.close()
except:
    pass

ncfile = Dataset('test.nc', mode='w', format='NETCDF4_CLASSIC')

lat_dim = ncfile.createDimension('lat', 1)    
lon_dim = ncfile.createDimension('lon', 1)    

time_dim = ncfile.createDimension('time', None) 
time = ncfile.createVariable('time', np.float64, ('time',))
time.units = 'Minutes since 2021-01-01 0'
time.long_name = 'time'
    
calendar = 'standard'
time[:] = date2num(
    (pd.to_datetime(data.index)).to_pydatetime(),
    units=time.units,
    calendar=calendar
)

Write netcdf file in a loop

for i, vname in enumerate(var):
    vname = ncfile.createVariable(vname,np.float64,'time')
    vname[:] = data[vname].values

I think the problem here is 'vname' which is a string. I tried to convert it into an object but am not able to do that. I am not sure if I am wrong.


Solution

  • using xarray you could write something like :

    import numpy as np
    import pandas as pd
    
    variables = ['one', 'two', 'three', 'four']
    date = pd.date_range(start='2021-01-01', end='2021-01-12')
    data_dict = {k: np.random.rand(12) for k in variables}
    data = pd.DataFrame(data_dict, index=date)
    data.index.name = 'time'
    
    ds = data.to_xarray().expand_dims(dim=['lat', 'lon'])
    ds.to_netcdf('test.nc', format='NETCDF4')
    

    The to_xarray method will give you a xarray.Dataset with time as a coordinate and your four variables. And the expand_dims adds the two spatial dimensions.

    Tell me if the resulting dataset/netcdf is not exaclty like you want.