GNU parallel with for loop function

I would like to utilize all the cores (48) in AWS to run my job. I have 6 million lists to run and each job runs for a less than a sec [real 0m0.004s user 0m0.005s sys 0m0.000s]. My following execution uses all the cores but is NOT 100%.

gnu_parallel -a list.lst --load 100% --joblog process.log sh job_run.sh {} >>score.out

job_run.sh

#!/bin/bash
i=$1
TMP_DIR=/home/ubuntu/test/$i
mkdir -p $TMP_DIR
cd $TMP_DIR/
m=`echo $i|awk -F '-' '{print $2}'`
n=`echo $i|awk -F '-' '{print $3}'`
cp /home/ubuntu/aligned/$m $TMP_DIR/
cp /home/ubuntu/aligned/$n $TMP_DIR/
printf '%s ' "$i"
/home/ubuntu/test/prog -s1 $m -s2 $n | grep 'GA'
cd $TMP_DIR/../
rm -rf $TMP_DIR
exit 0

Solution

Your problem is GNU Parallel's overhead: It takes 5-10 ms to start a job. So you will likely see GNU Parallel running at 100% on one core but the rest are idle.

But you can run multiple GNU Parallels: https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Speeding-up-fast-jobs

So split the list into smaller chunks and run those in parallel:

cat list.lst | parallel --block 100k -q -I,, --pipe parallel --joblog process.log{#} sh job_run.sh {} >>score.out

This should run 48+1 GNU Parallels so it should use all your cores. Most of your cores will be used for overhead because your jobs are so fast.

If you are not using the process.log, then it can be done with less overhead:

perl -pe 's/^/sh job_run.sh /' list.lst | parallel --pipe --block 100k sh >>score.out

This will prepend each line with sh job_run.sh and give 100kb of lines to 48 shs running in parallel.