OK here's a long one, brace yourself! :)
Recently I tried launching a watchdog script written in bash, during boot. So I added a line to rc.local containing the following:
su someuser -c "/home/someuser/watchdog.sh &"
the watchdog.sh looks like this:
#!/bin/bash
until /home/someuser/eventMonitoring.py
do
sleep 1
done
All is fine, all is good, the script gets started and all. However a new process appears in the processes list, and stays there forever:
UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
root 3048 1 0 1024 620 1 20:04 ? 00:00:00 startpar -f -- rc.local
Now, my script (watchdog.sh) got launched and was successfully detached because its PPID is also 1. I was then on a mission to find out what that process is. Startpar is part of sysvinit boot system (http://savannah.nongnu.org/projects/sysvinit). I'm currently on a Debian Wheezy 7.4.0 which uses that system. Now man startpar
says:
startpar is used to run multiple run-level scripts in parallel.
By a method of trial and error I basically figured out how to properly launch my script during boot and not leave startpar hanging. All file descriptors of a process need to be redirected to either a file or /dev/null or closed all together. Which when you think about is a rational thing to do. I finally did it like this:
su someuser -c "some_script.sh >/dev/null 2>&1 &"
That resolved the issue. But still left me wondering why that is. Why startpar behaves like it does. Is it a bug or is it a feature.
So I dived a bit into the code(http://svn.savannah.nongnu.org/viewvc/startpar/trunk/startpar.c?root=sysvinit&view=markup) and started going from the end to the beginning:
First I located where that startpar -f -- rc.local call is made:
line 741:
execlp(myname, myname, "-f", "--", p->name, NULL);
Ok so this will actually start a new startpar process which will replace the current running instance. It's basically a recursive call on itself. Lets look what that -f parameter does:
line 866:
case 'f':
forw = 1;
break;
OK, let's see what setting forw variable to 1 does...
line 900:
if (forw)
do_forward();
And finally let's see what's up with that function:
line 615:
void do_forward(void)
{
char buf[4096], *b;
ssize_t r, rr;
setsid();
while ((r = read(0, buf, sizeof(buf))))
{
if (r < 0)
{
if (errno == EINTR)
continue;
#if defined(DEBUG) && (DEBUG > 0)
perror("\n\rstartpar: forward read");
#endif
break;
}
b = buf;
while (r > 0)
{
rr = write(1, b, r);
if (rr < 0)
{
if (errno == EINTR)
continue;
perror("\n\rstartpar: forward write");
rr = r;
}
r -= rr;
b += rr;
}
}
_exit(0);
}
As far as I understand this. This will redirect all that is coming from file descriptor 0, to file descriptor 1. Now let's see what is really linked to those file descriptors:
root@server:~# ls -al /proc/3048/fd
total 0
dr-x------ 2 root root 0 Apr 2 21:13 .
dr-xr-xr-x 8 root root 0 Apr 2 21:13 ..
lrwx------ 1 root root 64 Apr 2 21:13 0 -> /dev/ptmx
lrwx------ 1 root root 64 Apr 2 21:13 1 -> /dev/console
lrwx------ 1 root root 64 Apr 2 21:13 2 -> /dev/console
Hmm interesting... So ptmx is according to man:
The file /dev/ptmx is a character file with major number 5
and minor number 2, usually of mode 0666 and owner.group of root.root.
It is used to create a pseudoterminal master and slave pair.
and console:
The current console is also addressed by
/dev/console or /dev/tty0, the character device with major number 4
and minor number 0.
And at that point I came here to stackoverflow. Now, can someone tell me what is going on here? Did I get this right, that startpar is left in a stage of constantly redirecting whatever comes to ptmx to the console? Why is it doing that? Why ptmx? Is this a bug?
This is definitely NOT a bug with startpar
, which is doing exactly what it promises to in the first place.
The output of each script is buffered and written when the script exits, so output lines of different scripts won't mix. You can modify this behaviour by setting a timeout.
Within the run()
function in startpar.c
,
Line 422: Obtain a handle to the master pseudoterminal (/dev/ptmx
in this case)
p->fd = getpt();
Line 429: Obtain the path of the corresponding slave pseudoterminal
else if ((m = ptsname(p->fd)) == 0 || grantpt(p->fd) || unlockpt(p->fd))
Line 438: Fork a child process
if ((p->pid = fork()) == (pid_t)-1)
Line 475: Invalidate default stdout
TEMP_FAILURE_RETRY(close(1));
Line 476: Obtain a handle to slave pseudoterminal. Now, this is 1
, i.e. the stdout
of child now redirects to the slave pseudoterminal (and is received by the master pseudoterminal node).
if (open(m, O_RDWR) != 1)
Line 481: Also capture stderr
by duplicating it with the salve pseudoterminal fd.
TEMP_FAILURE_RETRY(dup2(1, 2));
Line 561: After some book-keeping stuff, launch the executable of interest(as the child process)
execlp(p->name, p->arg0, (char *)0);
The parent process can then later on capture all the output/error logs of this newly launched process by reading the buffered master pseudoterminal and log it to the actual stdout (i.e. /dev/console
in this case).
startpar -f ...
process on your system?Explicitly marking a executable interactive tells startpar
to skip the psedoterminal master/slave trickery to buffer the terminal I/O as any output of the launched interactive executable needs to be displayed on screen immediately and not buffered.
This modifies the flow of execution in several places. Mainly at Line 1171, where startpar
does NOT call the run()
function for an interactive executable.
This has been tested and described here.
stdout
and stderr
of the executable to be launched.Using the construct ">/dev/null 2>&1 &"
discard stdout
/stderr
of the executable to be launched. If they are both explicitly set to NULL i.e. startpar does NOT buffer them indefinitely as it usually does otherwise.
startpar
Either configure timo
in startpar.c
The timeout set with the
-t
option is used as buffer timeout. If the output buffer of a script is not empty and the last output was timeout seconds ago, startpar will flush the buffer.
or gtimo
in startpar.c
The
-T
option timeout works more globally. If no output is printed for more than global_timeout seconds, startpar will flush the buffer of the script with the oldest output. Afterwards it will only print output of this script until it is finished.