Search code examples
bashparallel-processinggnu-parallel

How can I wait for n job in bash until first job done and add new task


When I have a series of jobs in a bash script (for instance in joblist.sh:each line include one job) and n available number of CPUs in my computer, I parallelize them by setting & at the end of all lines and put wait after every n line. It improves significantly speed of processing however it is not optimized. It waits until all n tasks finish and then it runs next n tasks. It would be much better if it waits until one job is done, then another job from the list of tasks will be replaced in the queue for processing by considering limited free memory. I was wondering if there is a technique in bash programming or using a code without installation as root on the server that could help for this aim. You can answer the question with gnu parallel but I prefer bash command without using gnu parallel.

One Solution (According Parallelize Bash script with maximum number of processes ) without considering free memory is

cat joblist.sh | parallel -j 12

My bash script for creating parallel list of job ( n=12 ):

awk '{print $0"  &"}' joblist.sh > joblist1.sh
awk '1;!(NR%12){print "wait";}' joblist1.sh > joblist_parallel.sh
chmod +x joblist_parallel.sh

Solution

  • Parallelizing jobs is not a simple task, GNU Parallel is the right tool . But If you want to stick to bash , a solution is to use jobs .

    jobs -lr will list all tasks you started in background with &

    #!/bin/bash
    
    my_complex_bash_job () {
        JOB=$(echo $RANDOM | md5sum )
        printf "[%s] JOB %02d begin $JOB\n" "$(gdate  +%F\ %T)" $1
        sleep $(( 4*(2 + $RANDOM % 10 )))
        printf "[%s] JOB %02d end   $JOB\n" "$(gdate  +%F\ %T)" $1
    }
    
    
    
    for J in $(seq 1  20 )
    do
        my_complex_bash_job $J &
        R=$(jobs -lr| wc -l )
        while [ $R -gt 3 ]
        do
            sleep 1
            R=$(jobs -lr| wc -l )
        done
    done