I would like to utilize all the cores (48) in AWS to run my job. I have 6 million lists to run and each job runs for a less than a sec [real 0m0.004s user 0m0.005s sys 0m0.000s]. My following execution uses all the cores but is NOT 100%.
gnu_parallel -a list.lst --load 100% --joblog process.log sh job_run.sh {} >>score.out
job_run.sh
#!/bin/bash
i=$1
TMP_DIR=/home/ubuntu/test/$i
mkdir -p $TMP_DIR
cd $TMP_DIR/
m=`echo $i|awk -F '-' '{print $2}'`
n=`echo $i|awk -F '-' '{print $3}'`
cp /home/ubuntu/aligned/$m $TMP_DIR/
cp /home/ubuntu/aligned/$n $TMP_DIR/
printf '%s ' "$i"
/home/ubuntu/test/prog -s1 $m -s2 $n | grep 'GA'
cd $TMP_DIR/../
rm -rf $TMP_DIR
exit 0
Your problem is GNU Parallel's overhead: It takes 5-10 ms to start a job. So you will likely see GNU Parallel running at 100% on one core but the rest are idle.
But you can run multiple GNU Parallels: https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Speeding-up-fast-jobs
So split the list into smaller chunks and run those in parallel:
cat list.lst | parallel --block 100k -q -I,, --pipe parallel --joblog process.log{#} sh job_run.sh {} >>score.out
This should run 48+1 GNU Parallels so it should use all your cores. Most of your cores will be used for overhead because your jobs are so fast.
If you are not using the process.log
, then it can be done with less overhead:
perl -pe 's/^/sh job_run.sh /' list.lst | parallel --pipe --block 100k sh >>score.out
This will prepend each line with sh job_run.sh
and give 100kb of lines to 48 sh
s running in parallel.