Search code examples
bashwaitbackground-process

How to wait for multiple backgrounded jobs to finish without losing exit-code?


Some of my scripts kick off a number of children in background, and then wait for their completion. Currently I invoke the wait in a loop for each PID:

for pid in "${!PIDs[@]}"
do
        if wait $pid
        then
                log ${PIDs[$pid]} completed successfully
        else
                log WARNING: ${PIDs[$pid]} failed
                errors+=1
        fi
done

This works, allowing me to analyze and process failures, but the processing happens in the order, in which PIDs are listed -- not in the order, in which the processes actually complete. That is, the 5th process may finish first, but its exit-code will not be processed until the first four are done...

As far as I know, sh provides two modes for wait:

  1. Bare wait will wait for all backgrounded jobs to finish, but it will always "succeed" losing the exit-codes of the backgrounded processes.
  2. wait PID will wait for the specified process. This is providing the exit-code, but can only wait for that one process.

But, maybe, bash has this improved compared to the old sh? Is there a way to request bash's wait to return when any of the backgrounded processes completes -- and have it provide both the finished PID and its exit-code?

The underlying C-functions waitpid and friends can do this -- if you provide the PID of -1. I tried doing that with bash and got an error...


Solution

  • Might take a few steps.

    Just as a test, complete with one process killed to prove it catches error codes.

    $: cat tst
    #! /usr/bin/env bash
    for x in 1 3 5 7 9; do sleep $x & done
    declare -A rc=()
    pids=($(jobs -pr))
    while (( ${#pids[@]} ))
    do for k in "${!pids[@]}"
       do p=${pids[$k]}
          if ps -p $p >/dev/null; then :
          else wait $p; rc+=( $p $? ); unset pids[$k]
               date +"%F %T PID $p: rc ${rc[$p]}"
          fi
       done
       ((skip++)) || kill ${pids[3]}
       sleep 1
    done
    
    $: ./tst
    2025-02-07 14:55:19 PID 3036: rc 0
    2025-02-07 14:55:19 PID 3039: rc 143
    2025-02-07 14:55:21 PID 3037: rc 0
    2025-02-07 14:55:23 PID 3038: rc 0
    2025-02-07 14:55:28 PID 3040: rc 0