Search code examples
cshellwaitpid

Differentiate processes states using waitpid and WNOHANG


While building a shell program I'm facing an issue of recognizing processes states. The description of the issue I'm facing with is that I have a list of child processes and I'm trying to figure out their state using waitpid and WNOHANG. I wish to distinguish between 3 states: TERMINATED, RUNNING and SUSPENDED. (as defined in the code below) I wish to change the processes states to one of these three above, however right now this function makes running processes statuses to be terminated, and this function also doesn't recognize suspended processes. I would like to know what am I doing wrong and how should the function updateProcessList be written to achieve it?

#define TERMINATED  -1
#define RUNNING 1
#define SUSPENDED 0

typedef struct process{
    cmdLine* cmd;                     /* the parsed command line*/
    pid_t pid;                        /* the process id that is running the command*/
    int status;                       /* status of the process: RUNNING/SUSPENDED/TERMINATED */
    struct process *next;             /* next process in chain */
} process;

void updateProcessList(process **process_list) {
    process *p = *process_list;
    int code = 0, status = 0,pidd = 0;
    while (p) {
        pidd = p->pid;
        code = waitpid(pidd, &status, WNOHANG);
        if (code == -1) {            /* child terminated*/
            p->status = TERMINATED;
        } else if(WIFEXITED(status)){
            p->status = TERMINATED;
        }else if(WIFSTOPPED(status)){
            p->status = SUSPENDED;
        }
        p = p->next;
    }
}

Solution

  • From man 2 waitpid:

    RETURN VALUE
    
        waitpid():  on  success, returns the process ID of the child whose state has changed;
        if WNOHANG was specified and one or more child(ren) specified by pid exist, but  have
        not yet changed state, then 0 is returned.  On error, -1 is returned.
    

    You should check the return value for 0... and also fix the rest of the checks.

    code = waitpid(ppid, &status, WNOHANG | WUNTRACED | WCONTINUED);
    
    if (code == -1) {
        // Handle error somehow... 
        // This doesn't necessarily mean that the child was terminated!
        // See manual page section "ERRORS".
    
        if (errno == ECHILD) {
            // Child was already terminated by something else.
            p->status = TERMINATED;
        } else {
            perror("waitpid failed");
        }
    } else if (code == 0) {
        // Child still in previous state.
        // Do nothing.
    } else if (WIFEXITED(status)) {
        // Child exited.
        p->status = TERMINATED;
    } else if (WIFSIGNALED(status)) {
        // Child killed by a signal.
        p->status = TERMINATED;
    } else if (WIFSTOPPED(status)) {
        // Child stopped.
        p->status = SUSPENDED;
    } else if (WIFCONTINUED(status)) {
        // This branch seems unnecessary, you should already know this
        // since you are the one that should kill(pid, SIGCONT) to make the
        // children continue.
        p->status = RUNNING; 
    } else {
        // This should never happen!
        abort();
    }
    

    Also, notice:

    1. My addition of WUNTRACED and WCONTINUED in the flags: WIFSTOPPED() cannot happen unless you are tracing the child with ptrace() or you used the WUNTRACED flag, and WIFCONTINUED() cannot happen unless WCONTINUED is used.
    2. The code and ppid variables should be pid_t, not int (the ppid variable also seems unneeded).

    In any case, consider adding a signal handler for SIGCHLD and updating the children statuses there. Your program will receive a SIGCHLD for every child that terminates/stops/resuems. It's much simpler and also faster (does not require to continuously call waitpid() on every single child process).