I've encountered a funny problem on unix systems (tested on SunOS and AIX): I execute a script and want to list itself (using ps
) - sometimes ps displays two, additional child processes of the script, sometimes its only one extra child process and most of the time the output correctly shows a single process. I've found a thread here Multiple processes with the same name but my case is different.
Consider a script named test.sh
as the one below:
#!/bin/ksh
echo before $$
ps -f | grep test.sh | grep -v grep
echo after $$
It's very simple - it shows its PID, then finds itself (and such like) on the process list and only filters out the grep command (just in case). Now, I'm going to execute a very simple routine in shell:
while [ 1 -eq 1 ]; do test.sh; done
Just an infinite loop executing test.sh
one by one. What I get in the output? See below:
before 20990
user 20990 14993 0 08:54:06 pts/5 0:00 /bin/ksh test.sh
after 20990
before 20994
user 20994 14993 0 08:54:06 pts/5 0:00 /bin/ksh test.sh
after 20994
before 20998
user 21001 20998 0 08:54:06 pts/5 0:00 /bin/ksh test.sh
user 21000 20998 0 08:54:06 pts/5 0:00 /bin/ksh test.sh
user 20998 14993 0 08:54:06 pts/5 0:00 /bin/ksh test.sh
after 20998
before 21002
user 21002 14993 0 08:54:06 pts/5 0:00 /bin/ksh test.sh
after 21002
before 21006
user 21006 14993 0 08:54:07 pts/5 0:00 /bin/ksh test.sh
after 21006
Can anyone explain to me what are the processes 21001 and 21000? they are not forked, since there are no traces "before/after" for them. This happens only occasionally...
This is not much of a problem for me but I'm curious to know what happens here and what to expect in more complex cases.
Let's say I want to allow my script's execution only if there are no other sessions of this script executed. I will then use ps and filter the "test.sh" + filter out all the lines with my PID - here, the script will filter out itself + its children, which is good, but that's just workaround to an issue that I don't really understand. Hence this thread :)
I didn't play with fetching the actual data stored in /proc, since I don't know the filesystem's structure on Sun or AIX.
These extra processes are the subshells about to process the first two pipeline components.
This is confirmed with running a dtrace script showing all exec calls:
before 3929
root 3929 1630 0 10:36:03 pts/3 0:00 /bin/ksh ./test.sh
root 3932 3929 0 10:36:03 pts/3 0:00 /bin/ksh ./test.sh
root 3931 3929 0 10:36:03 pts/3 0:00 /bin/ksh ./test.sh
after 3929
dtrace output for these processes:
2013 Sep 5 10:36:02 3929 /bin/ksh ./test.sh
2013 Sep 5 10:36:02 3931 grep test.sh
2013 Sep 5 10:36:02 3932 grep -v grep
The fact /bin/ksh ./test.sh
is displayed instead of the actual command run is argv[0]
has not been updated yet. It will be replaced only after the exec call has completed.
Just after a fork, both the parent and child process share the same argument list. The only difference is the process ID. This is what you are observing.