Search code examples
linuxconcurrencypidexit-codereturn-code

Does the PID of a child process become available for reuse if the parent process is still running?


I am running on *nix based OS's and have a script that initiates multiple processes concurrently. The main goal for me is to initiate these processes concurrently, and gather the returned exit statuses for each of the processes. I've found that using wait(pid) will achieve this, since all child processes are owned by the parent process. However, I am concerned that once a child process (one of the concurrent processes initiated) completes, its PID will be released and made available to be recycled within the system.

So I guess the question is, if a parent process initiates several child processes concurrently, will the PID of a child process that completes be made available to the system for recycling prior to the parent process completing? If so, how can I best obtain the exit statuses of each of the child processes?

Example of bash script below:

local file=$1
local count=0
<files are split; and suffixed with aa,ab,ac,ad>

/home/text/concurrencyTest.sh $file-aa >> /home/text/$file-aa.log 2>&1 &
/home/text/concurrencyTest1.sh $file-ab >> /home/text/$file-ab.log 2>&1 &
/home/text/concurrencyTest2.sh $file-ac >> /home/text/$file-ac.log 2>&1 &
/home/text/concurrencyTest3.sh $file-ad >> /home/text/$file-ad.log 2>&1 &

for job in `jobs -p`
do
    echo "Job: $job"
    wait "$job"
    rc=$?
    echo "RC for $job is $rc"
    if [[ rc -ne 0 ]]; then
        FAIL[$count]="$job"
        ((count++))
    fi
done
if [[ $count -ne 0 ]]; then
    echo "ERROR: $count Job(s) Failed!"
    echo "Failed Process PID(s): ${FAIL[@]}"
    echo "Failed Processing for file: $file"
    return 1
fi

Solution

  • The PID of a child process becomes available for reuse when the parent process calls wait or waitpid (or any other function of that family such as wait3, wait4, etc.).

    When the child dies, it stays behind as a zombie — an entry in the process table with no process behind it, which remains behind just to reserve the process ID and store the exit status. Calling waitpid blocks until the designated child process dies (or returns immediately if it's already dead), retrieves the child's status code, and reaps the zombie (i.e. removes the process table entry, freeing the process ID for reuse). Calling wait is similar, but returns as soon as one child process has died.

    If the parent process ignores the SIGCHLD signal at the time the process dies, then the process is not turned into a zombie and its PID becomes available for reuse immediately. The parent's status vis-à-vis SIGCHLD matters in other ways; see e.g. POSIX for the gritty details.

    If the parent process dies before the child, the child is said to be an orphan adopted by init, the process with PID 1. It is part of init's job to reap orphans.

    In a shell script, the wait builtin with is a wrapper around the wait system call. If the script has multiple children, wait with no argument blocks until all of them have died, and wait with some arguments blocks until all the specified processes have died (there's no way to wait until one process as died without specifying which). If wait $pid1 returns, it's possible that $pid2 has already died and has been reused for another process; however, the shell keeps track of $pid2's status code even so, and a subsequent wait $pid2 will return its status code. You should not fork a new background job until then, however, to avoid confusion in case $pid2 was reused to a background job.