I have the helper function below, used to execute a command and get the return value on posix systems. I used to use popen
, but it is impossible to get the return code of an application with popen
if it runs and exits before popen
/pclose
gets a chance to do its work.
The following helper function creates a process fork, uses execvp
to run the desired external process, and then the parent uses waitpid
to get the return code. I'm seeing odd cases where it's refusing to run.
When called with wait
= true
, waitpid
should return the exit code of the application no matter what. However, I'm seeing stdout
output that specifies the return code should be non-zero, yet the return code is zero. Testing the external process in a regular shell, then echo
ing $?
returns non-zero, so it's not a problem w/ the external process not returning the right code. If it's of any help, the external process being run is mount(8)
(yes, I know I can use mount(2)
but that's besides the point).
I apologize in advance for a code dump. Most of it is debugging/logging:
inline int ForkAndRun(const std::string &command, const std::vector<std::string> &args, bool wait = false, std::string *output = NULL)
{
std::string debug;
std::vector<char*> argv;
for(size_t i = 0; i < args.size(); ++i)
{
argv.push_back(const_cast<char*>(args[i].c_str()));
debug += "\"";
debug += args[i];
debug += "\" ";
}
argv.push_back((char*)NULL);
neosmart::logger.Debug("Executing %s", debug.c_str());
int pipefd[2];
if (pipe(pipefd) != 0)
{
neosmart::logger.Error("Failed to create pipe descriptor when trying to launch %s", debug.c_str());
return EXIT_FAILURE;
}
pid_t pid = fork();
if (pid == 0)
{
close(pipefd[STDIN_FILENO]); //child isn't going to be reading
dup2(pipefd[STDOUT_FILENO], STDOUT_FILENO);
close(pipefd[STDOUT_FILENO]); //now that it's been dup2'd
dup2(pipefd[STDOUT_FILENO], STDERR_FILENO);
if (execvp(command.c_str(), &argv[0]) != 0)
{
exit(EXIT_FAILURE);
}
return 0;
}
else if (pid < 0)
{
neosmart::logger.Error("Failed to fork when trying to launch %s", debug.c_str());
return EXIT_FAILURE;
}
else
{
close(pipefd[STDOUT_FILENO]);
int exitCode = 0;
if (wait)
{
waitpid(pid, &exitCode, wait ? __WALL : (WNOHANG | WUNTRACED));
std::string result;
char buffer[128];
ssize_t bytesRead;
while ((bytesRead = read(pipefd[STDIN_FILENO], buffer, sizeof(buffer)-1)) != 0)
{
buffer[bytesRead] = '\0';
result += buffer;
}
if (wait)
{
if ((WIFEXITED(exitCode)) == 0)
{
neosmart::logger.Error("Failed to run command %s", debug.c_str());
neosmart::logger.Info("Output:\n%s", result.c_str());
}
else
{
neosmart::logger.Debug("Output:\n%s", result.c_str());
exitCode = WEXITSTATUS(exitCode);
if (exitCode != 0)
{
neosmart::logger.Info("Return code %d", (exitCode));
}
}
}
if (output)
{
result.swap(*output);
}
}
close(pipefd[STDIN_FILENO]);
return exitCode;
}
}
Note that the command is run OK with the correct parameters, the function proceeds without any problems, and WIFEXITED
returns TRUE
. However, WEXITSTATUS
returns 0, when it should be returning something else.
I'm using the mongoose library, and grepping my code for SIGCHLD
revealed that using mg_start
from mongoose results in setting SIGCHLD
to SIG_IGN
.
From the waitpid
man page, on Linux a SIGCHLD
set to SIG_IGN
will not create a zombie process, so waitpid
will fail if the process has already successfully run and exited - but will run OK if it hasn't yet. This was the cause of the sporadic failure of my code.
Simply re-setting SIGCHLD
after calling mg_start
to a void function that does absolutely nothing was enough to keep the zombie records from being immediately erased.
Per @Geoff_Montee's advice, there was a bug in my redirect of STDERR
, but this was not responsible for the problem as execvp
does not store the return value in STDERR
or even STDOUT
, but rather in the kernel object associated with the parent process (the zombie record).
@jilles' warning about non-contiguity of vector
in C++ does not apply for C++03 and up (only valid for C++98, though in practice, most C++98 compilers did use contiguous storage, anyway) and was not related to this issue. However, the advice on reading from the pipe before blocking and checking the output of waitpid
is spot-on.