Search code examples
netcdfnco

ncks append slow for multiple small netcdf files


My model produces one netcdf file for every timestep and every variable, named DDDDDDD.VVV.nc, where DDDDDDD is the date and VVV the variable name.

For each timestep, I'm using nco to append the files corresponding to the different variables, in order to get one file per timestep.

#! /bin/bash  
# looping on timesteps to merge all variables 
# I use one variable 'O2o' to get the list of timesteps                                                                                                                                                                    
for timesteps in *.O2o.nc;
do
  timestep=$(echo $timesteps| cut -b -21)
  echo $timestep
   for var in $timestep*.nc;
   do
     ncks -Ah  $var 'F1_'$timestep.nc
   done
done

There are about 432 outputs variables, and each file is about 6,4K or 1,1K (the variables do not have the same number of dimensions).

I find the process very slow (eg. 15 seconds per timestep), while the files are very small. Any idea how I should optimize the script ?


Solution

  • The slowness is probably due to opening, moving data, adding data, and closing files 432 times. To optimize this, reduce the number of file operations, specifically the appending (which causes). Try writing all the data to one netCDF4 file at one time (in groups), then flatten the file into netCDF3. For each timestep it will look like this:

    ncecat --gag in*.nc all_group.nc
    ncks -3 -G : all_group.nc all_flat.nc
    

    Two commands instead of 432. If any variables appear in more than one input file, you will receive an error saying that a variable would be multiply defined in all_flat.nc. Avoid this by removing the duplicate inputs.