I recently wanted to get my hands dirty with understanding how to fork/exec a child process and redirecting stdin, stdout, and stderr thereof, by way of which I wrote my own popen()
and pclose()
-like functions named my_popen()
and my_pclose()
, inspired by Apple's open-source implementation of popen() and pclose().
By human-inspection -- e.g. running ps
in a different terminal to look for the expected child process -- the popen()
seems to work in that the expected child process shows up.
Question: Why does my_pclose()
return immediately with errno == 10 (ECHILD)
if I call it immediately after my_popen()
? My expectation was that my_pclose()
would wait until the child process ended.
Question: Given the above, why does my_pclose()
return as expected -- after the child process gracefully ends -- if I insert a delay between my_popen()
and my_pclose()
?
Question: What correction(s) is/are needed for my_pclose()
to reliably return only after the child process has ended, without the need of any delays or other contrivances?
MCVE below.
Some context: I wanted my_popen()
to allow the user to 1) write to the child process' stdin
, 2) read the child process' stdout
, 3) read the child process' stderr
, 4) know the child process' pid_t
, 5) run in environments where fork/exec'ed processes might be either child or grandchild processes, and be able to kill the grandchild process in case of the latter (hence the setpgid()
).
// main.c
#include <errno.h>
#include <pthread.h>
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
typedef int Pipe[2];
typedef enum PipeEnd {
READ_END = 0,
WRITE_END = 1
} PipeEnd;
#define INVALID_FD (-1)
#define INVALID_PID (0)
typedef struct my_popen_t {
bool success; ///< true if the child process was spawned.
Pipe stdin; ///< parent -> stdin[WRITE_END] -> child's stdin
Pipe stdout; ///< child -> stdout[WRITE_END] -> parent reads stdout[READ_END]
Pipe stderr; ///< child -> stderr[WRITE_END] -> parent reads stderr[READ_END]
pid_t pid; ///< child process' pid
} my_popen_t;
/** dup2( p[pe] ) then close and invalidate both ends of p */
static void dupFd( Pipe p, const PipeEnd pe, const int fd ) {
dup2( p[pe], fd);
close( p[READ_END] );
close( p[WRITE_END] );
p[READ_END] = INVALID_FD;
p[WRITE_END] = INVALID_FD;
}
/**
* Redirect a parent-accessible pipe to the child's stdin, and redirect the
* child's stdout and stderr to parent-accesible pipes.
*/
my_popen_t my_popen( const char* cmd ) {
my_popen_t r = { false,
{ INVALID_FD, INVALID_FD },
{ INVALID_FD, INVALID_FD },
{ INVALID_FD, INVALID_FD },
INVALID_PID };
if ( -1 == pipe( r.stdin ) ) { goto end; }
if ( -1 == pipe( r.stdout ) ) { goto end; }
if ( -1 == pipe( r.stderr ) ) { goto end; }
switch ( (r.pid = fork()) ) {
case -1: // Error
goto end;
case 0: // Child process
dupFd( r.stdin, READ_END, STDIN_FILENO );
dupFd( r.stdout, WRITE_END, STDOUT_FILENO );
dupFd( r.stderr, WRITE_END, STDERR_FILENO );
setpgid( getpid(), getpid() );
{
char* argv[] = { (char*)"sh", (char*)"-c", (char*)cmd, NULL };
// @todo Research why - as has been pointed out - _exit() should be
// used here, not exit().
if ( -1 == execvp( argv[0], argv ) ) { exit(0); }
}
}
// Parent process
close( r.stdin[READ_END] );
r.stdin[READ_END] = INVALID_FD;
close( r.stdout[WRITE_END] );
r.stdout[WRITE_END] = INVALID_FD;
close( r.stderr[WRITE_END] );
r.stderr[WRITE_END] = INVALID_FD;
r.success = true;
end:
if ( ! r.success ) {
if ( INVALID_FD != r.stdin[READ_END] ) { close( r.stdin[READ_END] ); }
if ( INVALID_FD != r.stdin[WRITE_END] ) { close( r.stdin[WRITE_END] ); }
if ( INVALID_FD != r.stdout[READ_END] ) { close( r.stdout[READ_END] ); }
if ( INVALID_FD != r.stdout[WRITE_END] ) { close( r.stdout[WRITE_END] ); }
if ( INVALID_FD != r.stderr[READ_END] ) { close( r.stderr[READ_END] ); }
if ( INVALID_FD != r.stderr[WRITE_END] ) { close( r.stderr[WRITE_END] ); }
r.stdin[READ_END] = r.stdin[WRITE_END] =
r.stdout[READ_END] = r.stdout[WRITE_END] =
r.stderr[READ_END] = r.stderr[WRITE_END] = INVALID_FD;
}
return r;
}
int my_pclose( my_popen_t* p ) {
if ( ! p ) { return -1; }
if ( ! p->success ) { return -1; }
if ( INVALID_PID == p->pid ) { return -1; }
{
pid_t pid = INVALID_PID;
int wstatus;
do {
pid = waitpid( -1 * (p->pid), &wstatus, 0 );
} while ( -1 == pid && EINTR == errno );
return ( -1 == pid ? pid : wstatus );
}
}
int main( int argc, char* argv[] ) {
my_popen_t p = my_popen( "sleep 3" );
//sleep( 1 ); // Uncomment this line for my_pclose() success.
int res = my_pclose( &p );
printf( "res: %d, errno: %d (%s)\n", res, errno, strerror( errno ) );
return 0;
}
Execution with undesired failure:
$ gcc --version && gcc -g ./main.c && ./a.out
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
res: -1, errno: 10 (No child processes)
Update:
This link made me wonder whether adding setpgid( pid, 0 )
in the parent-process after fork()
ing was relevant. It does appear to work in that after having made the addition, calling my_pclose()
immediately after my_popen()
does appear to wait until the process has completed.
Honestly, I don't quite understand why this made a difference; I'd be grateful if a knowledgeable community member could offer insight.
my_popen_t my_popen( const char* cmd ) {
my_popen_t r = { false,
{ INVALID_FD, INVALID_FD },
{ INVALID_FD, INVALID_FD },
{ INVALID_FD, INVALID_FD },
INVALID_PID };
if ( -1 == pipe( r.stdin ) ) { goto end; }
if ( -1 == pipe( r.stdout ) ) { goto end; }
if ( -1 == pipe( r.stderr ) ) { goto end; }
switch ( (r.pid = fork()) ) {
case -1: // Error
goto end;
case 0: // Child process
dupFd( r.stdin, READ_END, STDIN_FILENO );
dupFd( r.stdout, WRITE_END, STDOUT_FILENO );
dupFd( r.stderr, WRITE_END, STDERR_FILENO );
//setpgid( getpid(), getpid() ); // This looks unnecessary
{
char* argv[] = { (char*)"sh", (char*)"-c", (char*)cmd, NULL };
// @todo Research why - as has been pointed out - _exit() should be
// used here, not exit().
if ( -1 == execvp( argv[0], argv ) ) { exit(0); }
}
}
// Parent process
setpgid( r.pid, 0 ); // This is the relevant change
close( r.stdin[READ_END] );
r.stdin[READ_END] = INVALID_FD;
close( r.stdout[WRITE_END] );
r.stdout[WRITE_END] = INVALID_FD;
close( r.stderr[WRITE_END] );
r.stderr[WRITE_END] = INVALID_FD;
r.success = true;
end:
if ( ! r.success ) {
if ( INVALID_FD != r.stdin[READ_END] ) { close( r.stdin[READ_END] ); }
if ( INVALID_FD != r.stdin[WRITE_END] ) { close( r.stdin[WRITE_END] ); }
if ( INVALID_FD != r.stdout[READ_END] ) { close( r.stdout[READ_END] ); }
if ( INVALID_FD != r.stdout[WRITE_END] ) { close( r.stdout[WRITE_END] ); }
if ( INVALID_FD != r.stderr[READ_END] ) { close( r.stderr[READ_END] ); }
if ( INVALID_FD != r.stderr[WRITE_END] ) { close( r.stderr[WRITE_END] ); }
r.stdin[READ_END] = r.stdin[WRITE_END] =
r.stdout[READ_END] = r.stdout[WRITE_END] =
r.stderr[READ_END] = r.stderr[WRITE_END] = INVALID_FD;
}
return r;
}
The problem with your my_pclose()
is that you are trying to perform a process-group wait instead of waiting for the specific child process. This:
pid = waitpid( -1 * (p->pid), &wstatus, 0 );
attempts to wait for a child belonging to process group p->pid
, but that is extremely unlikely to work without the setpgid()
call you later added. The forked child will initially be in the same process group as its parent, and that group's process group number almost certainly will differ from the child's process number.
Moreover, it's unclear why you are trying to wait on the process group in the first place. You know the specific process you want to wait for, and it would be incorrect for my_pclose()
to collect a different one instead, regardless of whether it belongs to the same process group. You should wait for that specific process:
pid = waitpid(p->pid, &wstatus, 0 );
That will work either with or without the setpgid()
call, but almost certainly you should omit that call in a general-purpose function such as this.