I'm seeing a very, very strange behavior with bash (version 3.2.25 on RHEL 5.3).
I have a 'Launcher' script that does the following (as a foreground process, running in a terminal that remains open throughout):
The idea above is essentially to have the A, the C, and the two B's communicate with each other until they are killed by the user. (They keep running with a while sleep DURATION; do ... ; done
loop.)
The Problem:
After the above 3 steps are complete, when I repeatedly issue ps -ef
from another terminal window, I sometimes see a few additional, spurious instances of B (say B3, B4...) and/or sometimes an additional, spurious instance of A being listed!
These additional instances are transient -- they come and go from the ps -ef
listing.
Further, these spurious instances happen to be children -- and not siblings -- of the valid (or, the desired) processes. For example, B3 and B4 would list B1 and B2, respectively, as their parent; similarly, the spurious A2 would list A as its parent!
Now, I am PRETTY DARNED SURE that I am NO WAY creating any additional B instances from within a B, nor any A instance from inside an A.
So, what is going on here?
Many thanks, in advance.
PS: I have seen a similar problem (of multiple spurious instances) a while back in context of cron jobs that were designed to to hang around indefinitely upon their initial first launch. Here too, I would see multiple instances of my cron job even though I had explicit logic in place to prevent crond
from launching any additional instances (by checking the existence of a lock file on disk). And, even here, I wasn't quite able to figure out the problem.
$ ps -ejfH
UID PID PPID PGID SID C STIME TTY TIME CMD
root 28503 1 28474 11126 0 22:14 pts/1 00:00:31 /bin/bash A
root 28525 28503 28474 11126 0 22:14 pts/1 00:00:26 /bin/bash B
root 16143 28525 28474 11126 0 23:14 pts/1 00:00:00 [B] <defunct>
root 16144 28525 28474 11126 0 23:14 pts/1 00:00:00 /bin/bash B
root 28531 28503 28474 11126 0 22:14 pts/1 00:00:23 /bin/bash B
root 28566 28503 28474 11126 0 22:14 pts/1 00:00:01 /bin/bash C
$ ps -ejfH
UID PID PPID PGID SID C STIME TTY TIME CMD
root 28503 1 28474 11126 0 22:14 pts/1 00:00:31 /bin/bash A
root 28525 28503 28474 11126 0 22:14 pts/1 00:00:26 /bin/bash B
root 28531 28503 28474 11126 0 22:14 pts/1 00:00:23 /bin/bash B
root 28566 28503 28474 11126 0 22:14 pts/1 00:00:01 /bin/bash C
root 18579 28503 28474 11126 0 23:14 pts/1 00:00:00 /bin/bash A
$ ps -ejfH
UID PID PPID PGID SID C STIME TTY TIME CMD
root 28503 1 28474 11126 0 22:14 pts/1 00:00:31 /bin/bash A
root 28525 28503 28474 11126 0 22:14 pts/1 00:00:26 /bin/bash B
root 22717 28525 28474 11126 0 23:14 pts/1 00:00:00 /bin/bash B
root 22718 22717 28474 11126 0 23:14 pts/1 00:00:00 /bin/bash B
root 28531 28503 28474 11126 0 22:14 pts/1 00:00:23 /bin/bash B
root 28566 28503 28474 11126 0 22:14 pts/1 00:00:01 /bin/bash C
$ ps -ejfH
UID PID PPID PGID SID C STIME TTY TIME CMD
root 28503 1 28474 11126 0 22:14 pts/1 00:00:31 /bin/bash A
root 28525 28503 28474 11126 0 22:14 pts/1 00:00:26 /bin/bash B
root 28531 28503 28474 11126 0 22:14 pts/1 00:00:23 /bin/bash B
root 28566 28503 28474 11126 0 22:14 pts/1 00:00:01 /bin/bash C
$ ps -ejfH
UID PID PPID PGID SID C STIME TTY TIME CMD
root 28503 1 28474 11126 0 22:14 pts/1 00:00:32 /bin/bash A
root 28525 28503 28474 11126 0 22:14 pts/1 00:00:27 /bin/bash B
root 28531 28503 28474 11126 0 22:14 pts/1 00:00:24 /bin/bash B
root 32021 28531 28474 11126 0 23:15 pts/1 00:00:00 /bin/bash B
root 32023 32021 28474 11126 0 23:15 pts/1 00:00:00 [B] <defunct>
root 28566 28503 28474 11126 0 22:14 pts/1 00:00:01 /bin/bash C
root 32013 28503 28474 11126 0 23:15 pts/1 00:00:00 /bin/bash A
$ ps -ejfH
UID PID PPID PGID SID C STIME TTY TIME CMD
root 28503 1 28474 11126 0 22:14 pts/1 00:00:32 /bin/bash A
root 28525 28503 28474 11126 0 22:14 pts/1 00:00:27 /bin/bash B
root 28531 28503 28474 11126 0 22:14 pts/1 00:00:24 /bin/bash B
root 28566 28503 28474 11126 0 22:14 pts/1 00:00:01 /bin/bash C
root 2310 28503 28474 11126 0 23:15 pts/1 00:00:00 /bin/bash A
root 2324 2310 28474 11126 0 23:15 pts/1 00:00:00 /bin/bash A
$ ps -ejfH
UID PID PPID PGID SID C STIME TTY TIME CMD
root 28503 1 28474 11126 0 22:14 pts/1 00:00:32 /bin/bash A
root 28525 28503 28474 11126 0 22:14 pts/1 00:00:27 /bin/bash B
root 28531 28503 28474 11126 0 22:14 pts/1 00:00:24 /bin/bash B
root 9219 28531 28474 11126 0 23:16 pts/1 00:00:00 [B] <defunct>
root 28566 28503 28474 11126 0 22:14 pts/1 00:00:02 /bin/bash C
$ ps -ejfH
UID PID PPID PGID SID C STIME TTY TIME CMD
root 28503 1 28474 11126 0 22:14 pts/1 00:00:32 /bin/bash A
root 28525 28503 28474 11126 0 22:14 pts/1 00:00:27 /bin/bash B
root 28531 28503 28474 11126 0 22:14 pts/1 00:00:24 /bin/bash B
root 28566 28503 28474 11126 0 22:14 pts/1 00:00:02 /bin/bash C
root 9692 28503 28474 11126 0 23:16 pts/1 00:00:00 /bin/bash A
$ ps -ejfH
UID PID PPID PGID SID C STIME TTY TIME CMD
root 28503 1 28474 11126 0 22:14 pts/1 00:00:33 /bin/bash A
root 28525 28503 28474 11126 0 22:14 pts/1 00:00:27 /bin/bash B
root 28531 28503 28474 11126 0 22:14 pts/1 00:00:24 /bin/bash B
root 28566 28503 28474 11126 0 22:14 pts/1 00:00:02 /bin/bash C
root 15686 28503 28474 11126 0 23:16 pts/1 00:00:00 /bin/bash A
There are a number of bash features that spawn a subshell to execute part of the script. My guess is that your A and B scripts are using some of these features. In addition to explicitly creating a subshell by enclosing commands in ( ... )
, subshells will also be created for any bash commands run in a pipeline, in a command substitution ($( ... )
or backticks), or backgrounded with &
. Here's a script that illustrates these:
$ cat a
#!/bin/bash
echo "Initial subshell count: $BASH_SUBSHELL"
ps -opid,ppid,command | egrep "PID|bash ./a"
echo "input" | while read line; do
echo "Subshell count in pipeline: $BASH_SUBSHELL"
ps -opid,ppid,command | egrep "PID|bash ./a"
done
output=$(echo "Subshell count in \$(): $BASH_SUBSHELL"
ps -opid,ppid,command | egrep "PID|bash ./a"
)
echo "$output"
( echo "Subshell count in (): $BASH_SUBSHELL"
ps -opid,ppid,command | egrep "PID|bash ./a"
)
{ echo "Subshell count in backgrounded command: $BASH_SUBSHELL"
ps -opid,ppid,command | egrep "PID|bash ./a"
} &
sleep 1
$ ./a
Initial subshell count: 0
PID PPID COMMAND
1410 158 /bin/bash ./a
1412 1410 egrep PID|bash ./a
Subshell count in pipeline: 1
PID PPID COMMAND
1410 158 /bin/bash ./a
1414 1410 /bin/bash ./a
1416 1414 egrep PID|bash ./a
Subshell count in $(): 1
PID PPID COMMAND
1410 158 /bin/bash ./a
1417 1410 /bin/bash ./a
1419 1417 egrep PID|bash ./a
Subshell count in (): 1
PID PPID COMMAND
1410 158 /bin/bash ./a
1420 1410 /bin/bash ./a
1422 1420 egrep PID|bash ./a
Subshell count in backgrounded command: 1
PID PPID COMMAND
1410 158 /bin/bash ./a
1423 1410 /bin/bash ./a
1426 1423 egrep PID|bash ./a
(Note: in the echo ... | while ...
example, both echo
and the while loop execute in subshells; but the echo command exits too quickly for ps
to show it.)