Search code examples
linuxbashbackgroundjobs

bash launching additional, spurious background processes


I'm seeing a very, very strange behavior with bash (version 3.2.25 on RHEL 5.3).

I have a 'Launcher' script that does the following (as a foreground process, running in a terminal that remains open throughout):

  1. It launches a program A in the background, and quits.
  2. A, then, launches two (process-) instances of program B in the background (say: B1, B2);
  3. A, also launches one instance of program C in the background.

The idea above is essentially to have the A, the C, and the two B's communicate with each other until they are killed by the user. (They keep running with a while sleep DURATION; do ... ; done loop.)

The Problem:

After the above 3 steps are complete, when I repeatedly issue ps -ef from another terminal window, I sometimes see a few additional, spurious instances of B (say B3, B4...) and/or sometimes an additional, spurious instance of A being listed!

These additional instances are transient -- they come and go from the ps -ef listing.

Further, these spurious instances happen to be children -- and not siblings -- of the valid (or, the desired) processes. For example, B3 and B4 would list B1 and B2, respectively, as their parent; similarly, the spurious A2 would list A as its parent!

Now, I am PRETTY DARNED SURE that I am NO WAY creating any additional B instances from within a B, nor any A instance from inside an A.

So, what is going on here?

Many thanks, in advance.

PS: I have seen a similar problem (of multiple spurious instances) a while back in context of cron jobs that were designed to to hang around indefinitely upon their initial first launch. Here too, I would see multiple instances of my cron job even though I had explicit logic in place to prevent crond from launching any additional instances (by checking the existence of a lock file on disk). And, even here, I wasn't quite able to figure out the problem.


$ ps -ejfH 
UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
root     28503     1 28474 11126  0 22:14 pts/1    00:00:31   /bin/bash A
root     28525 28503 28474 11126  0 22:14 pts/1    00:00:26     /bin/bash B 
root     16143 28525 28474 11126  0 23:14 pts/1    00:00:00       [B] <defunct>
root     16144 28525 28474 11126  0 23:14 pts/1    00:00:00       /bin/bash B 
root     28531 28503 28474 11126  0 22:14 pts/1    00:00:23     /bin/bash B 
root     28566 28503 28474 11126  0 22:14 pts/1    00:00:01     /bin/bash C

$ ps -ejfH 
UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
root     28503     1 28474 11126  0 22:14 pts/1    00:00:31   /bin/bash A
root     28525 28503 28474 11126  0 22:14 pts/1    00:00:26     /bin/bash B 
root     28531 28503 28474 11126  0 22:14 pts/1    00:00:23     /bin/bash B 
root     28566 28503 28474 11126  0 22:14 pts/1    00:00:01     /bin/bash C
root     18579 28503 28474 11126  0 23:14 pts/1    00:00:00     /bin/bash A

$ ps -ejfH
UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
root     28503     1 28474 11126  0 22:14 pts/1    00:00:31   /bin/bash A
root     28525 28503 28474 11126  0 22:14 pts/1    00:00:26     /bin/bash B 
root     22717 28525 28474 11126  0 23:14 pts/1    00:00:00       /bin/bash B 
root     22718 22717 28474 11126  0 23:14 pts/1    00:00:00         /bin/bash B 
root     28531 28503 28474 11126  0 22:14 pts/1    00:00:23     /bin/bash B 
root     28566 28503 28474 11126  0 22:14 pts/1    00:00:01     /bin/bash C

$ ps -ejfH
UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
root     28503     1 28474 11126  0 22:14 pts/1    00:00:31   /bin/bash A
root     28525 28503 28474 11126  0 22:14 pts/1    00:00:26     /bin/bash B 
root     28531 28503 28474 11126  0 22:14 pts/1    00:00:23     /bin/bash B 
root     28566 28503 28474 11126  0 22:14 pts/1    00:00:01     /bin/bash C

$ ps -ejfH
UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
root     28503     1 28474 11126  0 22:14 pts/1    00:00:32   /bin/bash A
root     28525 28503 28474 11126  0 22:14 pts/1    00:00:27     /bin/bash B 
root     28531 28503 28474 11126  0 22:14 pts/1    00:00:24     /bin/bash B 
root     32021 28531 28474 11126  0 23:15 pts/1    00:00:00       /bin/bash B 
root     32023 32021 28474 11126  0 23:15 pts/1    00:00:00         [B] <defunct>
root     28566 28503 28474 11126  0 22:14 pts/1    00:00:01     /bin/bash C
root     32013 28503 28474 11126  0 23:15 pts/1    00:00:00     /bin/bash A

$ ps -ejfH
UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
root     28503     1 28474 11126  0 22:14 pts/1    00:00:32   /bin/bash A
root     28525 28503 28474 11126  0 22:14 pts/1    00:00:27     /bin/bash B 
root     28531 28503 28474 11126  0 22:14 pts/1    00:00:24     /bin/bash B 
root     28566 28503 28474 11126  0 22:14 pts/1    00:00:01     /bin/bash C
root      2310 28503 28474 11126  0 23:15 pts/1    00:00:00     /bin/bash A
root      2324  2310 28474 11126  0 23:15 pts/1    00:00:00       /bin/bash A

$ ps -ejfH
UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
root     28503     1 28474 11126  0 22:14 pts/1    00:00:32   /bin/bash A
root     28525 28503 28474 11126  0 22:14 pts/1    00:00:27     /bin/bash B 
root     28531 28503 28474 11126  0 22:14 pts/1    00:00:24     /bin/bash B 
root      9219 28531 28474 11126  0 23:16 pts/1    00:00:00       [B] <defunct>
root     28566 28503 28474 11126  0 22:14 pts/1    00:00:02     /bin/bash C

$ ps -ejfH
UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
root     28503     1 28474 11126  0 22:14 pts/1    00:00:32   /bin/bash A
root     28525 28503 28474 11126  0 22:14 pts/1    00:00:27     /bin/bash B 
root     28531 28503 28474 11126  0 22:14 pts/1    00:00:24     /bin/bash B 
root     28566 28503 28474 11126  0 22:14 pts/1    00:00:02     /bin/bash C
root      9692 28503 28474 11126  0 23:16 pts/1    00:00:00     /bin/bash A

$ ps -ejfH
UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
root     28503     1 28474 11126  0 22:14 pts/1    00:00:33   /bin/bash A
root     28525 28503 28474 11126  0 22:14 pts/1    00:00:27     /bin/bash B 
root     28531 28503 28474 11126  0 22:14 pts/1    00:00:24     /bin/bash B 
root     28566 28503 28474 11126  0 22:14 pts/1    00:00:02     /bin/bash C
root     15686 28503 28474 11126  0 23:16 pts/1    00:00:00     /bin/bash A

Solution

  • There are a number of bash features that spawn a subshell to execute part of the script. My guess is that your A and B scripts are using some of these features. In addition to explicitly creating a subshell by enclosing commands in ( ... ), subshells will also be created for any bash commands run in a pipeline, in a command substitution ($( ... ) or backticks), or backgrounded with &. Here's a script that illustrates these:

    $ cat a
    #!/bin/bash
    
    echo "Initial subshell count: $BASH_SUBSHELL"
    ps -opid,ppid,command | egrep "PID|bash ./a"
    
    echo "input" | while read line; do
        echo "Subshell count in pipeline: $BASH_SUBSHELL"
        ps -opid,ppid,command | egrep "PID|bash ./a"
    done
    
    output=$(echo "Subshell count in \$(): $BASH_SUBSHELL"
       ps -opid,ppid,command | egrep "PID|bash ./a"
    )
    echo "$output"
    
    (   echo "Subshell count in (): $BASH_SUBSHELL"
        ps -opid,ppid,command | egrep "PID|bash ./a"
    )
    
    {   echo "Subshell count in backgrounded command: $BASH_SUBSHELL"
        ps -opid,ppid,command | egrep "PID|bash ./a"
    } &
    sleep 1
    $ ./a
    Initial subshell count: 0
      PID  PPID COMMAND
     1410   158 /bin/bash ./a
     1412  1410 egrep PID|bash ./a
    Subshell count in pipeline: 1
      PID  PPID COMMAND
     1410   158 /bin/bash ./a
     1414  1410 /bin/bash ./a
     1416  1414 egrep PID|bash ./a
    Subshell count in $(): 1
      PID  PPID COMMAND
     1410   158 /bin/bash ./a
     1417  1410 /bin/bash ./a
     1419  1417 egrep PID|bash ./a
    Subshell count in (): 1
      PID  PPID COMMAND
     1410   158 /bin/bash ./a
     1420  1410 /bin/bash ./a
     1422  1420 egrep PID|bash ./a
    Subshell count in backgrounded command: 1
      PID  PPID COMMAND
     1410   158 /bin/bash ./a
     1423  1410 /bin/bash ./a
     1426  1423 egrep PID|bash ./a
    

    (Note: in the echo ... | while ... example, both echo and the while loop execute in subshells; but the echo command exits too quickly for ps to show it.)