Search code examples
bashparallel-processingnetcdfgnu-parallel

Pass in array to GNU Parallel to replace for loop


 a) I want to run 2 scripts in parallel

b) I want to my for loops within those scripts in parallel.

Before I had this code:

for year in 2000 2001 2002 2003; do

  echo $year" LST data being merged"

  cd $base_data_dir/$year

  # this is the part that takes a long time
  cdo -f nc2 mergetime *.nc $output_dir/LST_$year.nc

done

I wanted to use GNU Parallel to try and run this in parallel.

I tried the following:

a) Create a 'controller' script that calls other scripts

b) pass in an array as arguments to GNU parallel

The controller script

# 1. Create monthly LST for each year

cd $working_dir
seq 2000 2003 | parallel 'bash create_yearly_LST_files.sh {}'

# 2. Create monthly NDVI for each year

cd $working_dir
seq 2000 2003 | parallel 'bash create_yearly_NDVI_files.sh {}'

This should be running the following in parallel:

bash create_yearly_LST_files.sh 2000
bash create_yearly_LST_files.sh 2001
...

bash create_yearly_NDVI_files.sh 2000
bash create_yearly_NDVI_files.sh 2001
...

The processing script (the same for NDVI)

year="$1"
echo $year" LST data being merged"
cd $base_data_dir/$year

cdo -f nc2 mergetime *.nc $output_dir/LST_$year.nc

So the commands should read:

cd $base_data_dir/2000
cdo -f nc2 mergetime *.nc $output_dir/LST_2000.nc

cd $base_data_dir/2001
cdo -f nc2 mergetime *.nc $output_dir/LST_2001.nc
...

cd $base_data_dir/2000
cdo -f nc2 mergetime *.nc $output_dir/NDVI_2000.nc

cd $base_data_dir/2001
cdo -f nc2 mergetime *.nc $output_dir/NDVI_2001.nc
...

My Question:

The processes still work in my new code but there was no performance speed up.

Can anyone help me understand how to pass each year to be run in parallel?

And also run both of the scripts in parallel (create_yearly_LST_files.sh and create_yearly_NDVI_files.sh)


Solution

  • What is stopping you from doing

    for year in 2000 2001 2002 2003; do
    
      echo $year" LST data being merged"
    
      cd $base_data_dir/$year
    
      # this is the part that takes a long time
      cdo -f nc2 mergetime *.nc $output_dir/LST_$year.nc &
    
    done
    wait