waitpid/wexitstatus returning 0 instead of correct return code

I have the helper function below, used to execute a command and get the return value on posix systems. I used to use popen, but it is impossible to get the return code of an application with popen if it runs and exits before popen/pclose gets a chance to do its work.

The following helper function creates a process fork, uses execvp to run the desired external process, and then the parent uses waitpid to get the return code. I'm seeing odd cases where it's refusing to run.

When called with wait = true, waitpid should return the exit code of the application no matter what. However, I'm seeing stdout output that specifies the return code should be non-zero, yet the return code is zero. Testing the external process in a regular shell, then echoing $? returns non-zero, so it's not a problem w/ the external process not returning the right code. If it's of any help, the external process being run is mount(8) (yes, I know I can use mount(2) but that's besides the point).

I apologize in advance for a code dump. Most of it is debugging/logging:

inline int ForkAndRun(const std::string &command, const std::vector<std::string> &args, bool wait = false, std::string *output = NULL)
{
    std::string debug;

    std::vector<char*> argv;
    for(size_t i = 0; i < args.size(); ++i)
    {
        argv.push_back(const_cast<char*>(args[i].c_str()));
        debug += "\"";
        debug += args[i];
        debug += "\" ";
    }
    argv.push_back((char*)NULL);

    neosmart::logger.Debug("Executing %s", debug.c_str());

    int pipefd[2];

    if (pipe(pipefd) != 0)
    {
        neosmart::logger.Error("Failed to create pipe descriptor when trying to launch %s", debug.c_str());
        return EXIT_FAILURE;
    }

    pid_t pid = fork();

    if (pid == 0)
    {
        close(pipefd[STDIN_FILENO]); //child isn't going to be reading
        dup2(pipefd[STDOUT_FILENO], STDOUT_FILENO);
        close(pipefd[STDOUT_FILENO]); //now that it's been dup2'd
        dup2(pipefd[STDOUT_FILENO], STDERR_FILENO);

        if (execvp(command.c_str(), &argv[0]) != 0)
        {
            exit(EXIT_FAILURE);
        }
        return 0;
    }
    else if (pid < 0)
    {
        neosmart::logger.Error("Failed to fork when trying to launch %s", debug.c_str());
        return EXIT_FAILURE;
    }
    else
    {
        close(pipefd[STDOUT_FILENO]);

        int exitCode = 0;

        if (wait)
        {
            waitpid(pid, &exitCode, wait ? __WALL : (WNOHANG | WUNTRACED));

            std::string result;
            char buffer[128];
            ssize_t bytesRead;
            while ((bytesRead = read(pipefd[STDIN_FILENO], buffer, sizeof(buffer)-1)) != 0)
            {
                buffer[bytesRead] = '\0';
                result += buffer;
            }

            if (wait)
            {
                if ((WIFEXITED(exitCode)) == 0)
                {
                    neosmart::logger.Error("Failed to run command %s", debug.c_str());
                    neosmart::logger.Info("Output:\n%s", result.c_str());
                }
                else
                {
                    neosmart::logger.Debug("Output:\n%s", result.c_str());
                    exitCode = WEXITSTATUS(exitCode);
                    if (exitCode != 0)
                    {
                        neosmart::logger.Info("Return code %d", (exitCode));
                    }
                }
            }

            if (output)
            {
                result.swap(*output);
            }
        }

        close(pipefd[STDIN_FILENO]);

        return exitCode;
    }
}

Note that the command is run OK with the correct parameters, the function proceeds without any problems, and WIFEXITED returns TRUE. However, WEXITSTATUS returns 0, when it should be returning something else.

Solution

I'm using the mongoose library, and grepping my code for SIGCHLD revealed that using mg_start from mongoose results in setting SIGCHLD to SIG_IGN.

From the waitpid man page, on Linux a SIGCHLD set to SIG_IGN will not create a zombie process, so waitpid will fail if the process has already successfully run and exited - but will run OK if it hasn't yet. This was the cause of the sporadic failure of my code.

Simply re-setting SIGCHLD after calling mg_start to a void function that does absolutely nothing was enough to keep the zombie records from being immediately erased.

Per @Geoff_Montee's advice, there was a bug in my redirect of STDERR, but this was not responsible for the problem as execvp does not store the return value in STDERR or even STDOUT, but rather in the kernel object associated with the parent process (the zombie record).

@jilles' warning about non-contiguity of vector in C++ does not apply for C++03 and up (only valid for C++98, though in practice, most C++98 compilers did use contiguous storage, anyway) and was not related to this issue. However, the advice on reading from the pipe before blocking and checking the output of waitpid is spot-on.