Search code examples
clinuxsystems-programmingwaitpid

Systems programming: wait(&status)'s return value


While learning about forking and piping, I came across the following excellent tutorial: https://www.cs.rutgers.edu/~pxk/416/notes/c-tutorials/pipe.html

However, the tutorial goes into discussing how one can establish a pipe between 2 child processes that were spawned by the last parent. While it does a great job of that, a certain line of code has confounded me for a while:

while ((pid = wait(&status)) != -1) /* pick up all the dead children*/
    fprintf(stderr, "process %d exits with %d\n", pid, WEXITSTATUS(status));
exit(0);

I am really confused about wait(&status) (yes, I have read the man page at http://linux.die.net/man/2/wait). We just declare an int status, never really give it a value, and just pass it to wait. Is this status set transparently in the wait() function?

The man page says:

wait(): on success, returns the process ID of the terminated child; on error, -1 is returned.

So in the above lines of code, the while loop exits when wait(&status) returns -1. This is jarring: was there an error, and why? How do we ensure that the parent keeps spinning until all its children terminate properly? What is 'status' anyway, and how is it set?

Edit: To add, the program does compile and run perfectly.


Solution

  • When you call wait(&status), the & says "take the address of" status.

    So, you're passing a pointer to status, and not its value.

    The wait routine will use this pointer to fill in status with a value before it returns. This is in addition to the function's return value.

    Thus, status will be valid after the call.

    The reason that you don't need to set status to anything before the call is that wait only sets a value and does not try to use the value beforehand.

    status contains a few things. Whether the child program exited normally (e.g. exit code 0, or with an error code of [say] 1). Whether the program was terminated by a signal (e.g. SIGSEGV). Whether the child was "stopped" [used by strace, gdb, and other debuggers]. If you look at the wait(2) manpage, there are a whole bunch of "helper" macros, such as, WIFEXITED, WEXITSTATUS, WIFSIGNALED, WTERMSIG, etc. that can help make sense of the status value

    As to the return value, -1 means error. But, the error [in the global errno] can be ECHILD. This just means there are no more children of the parent process remaining. (i.e.) The program has waited for and "reaped" all children that were previously fired up via fork.