Search code examples
c++forkpipe

New to pipe() and fork() in C++ -


Environment: Linux 2.6.32 (RHEL 6.3) on x86_64 with gcc 4.4.6

Background: I am running doing some heavy data crunching: ~500 GB input data spread over ~2000 files. My main process forks N children, each of which receives a list of filenames to crunch.

What I want is for console I/O to pass through the parent. I have been looking into pipe() and see some fascinating stuff about using poll() to have my parent block until there are error messages to read. It seems that I need to have N pipes (one per child) and pass poll() information about what signals I want to listen to. Also, I think that once I dup2(pipe[1], STDOUT) in each child, each child should be able to write to the pipe with cout << stuff; as usual, right?

First, is what I have said above about multiple pipes, poll()ing and dup2() correct?

Second, how do I set up the parent poll() loop so that I move on once all the children have died?

Right now, this (incomplete) section of code reads as follows:

int status;
while (1) { // wait for stuff
    while ((status = poll(pollfds, ss.max_forks, -1)) > 1)
        cout << "fork "<< status << ": " << pipes[status][0];
    if (status == -1)   Die(errno, "poll error");
    if (status == 0) { // check that we still have at least one open fd
        bool still_running = false;
        for (int i=0; i<ss.max_forks; i++) {
             // check pipe i and set still_running if it is not zero
        }
        if (!still_running)
            break;
    }
}

Third, what should I set and when should I set it with fcntl()? Do I want to do O_ASYNC? Do I want to do blocking or nonblocking?


Solution

  • Actually, you need to close() the respective "unused" side in both processes (parent and child), to make sure the "broken pipe" comes across. Thus, if the child writes into Pipe[0], then the parent will read from Pipe[1] and close its own Pipe[0]. Likewise, the child will close Pipe[1].

    If you do this, the parent will get an error when it reads from the pipe after the child has died. Don't forget to use one of the waitpid()-style functions to clean up the dead processes.

    You might want to sett the handles to nonblocking, so you can just read whatever is there without having to use 1-byte-reads which would be horribly inefficient. Although I just make one call to read() with a suitable buffersize (usually 1024 or 4096), and just let the next poll trigger if there's more data. But then, I usually just have one child to work with, not a few hundred :-)

    As for your loop, you'll have to track the state of each child, and exit when you have no live children left.

    EDIT: actually, I find that I assume the child is dead when I get a 0-byte read even though POLLIN was set, or when I get POLLERR or POLLHUP flags. Not sure which case is the correct one...