Search code examples
csignalsforkexec

process killed more than one time does't receive signal SIGUSR1


I have 2 .c scripts. One is father.c:

#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <stdlib.h>

pid_t childPid;
void createChild(){
    fprintf(stderr,"creating a new child\n");
    childPid=fork();
    if(childPid==0){
        execlp("./child","child",NULL);
        perror("err");
    }
    fprintf(stderr,"created child pid is %d\n",childPid);
}
void signalHandler(int signal){
    if(signal==SIGUSR1){
        kill(childPid,SIGINT);
        createChild();
    }
}

int main(int argc,char ** argv){
    signal(SIGUSR1,signalHandler);
    signal(SIGCHLD,SIG_IGN);    
    createChild();/*create first child*/    
    while(1){
        sleep(2);
        int ret=kill(childPid,SIGUSR1);
        if(ret==-1){
            perror("exit err ");
        }
    }
}


The other is child.c:

#include <stdio.h>
#include <unistd.h>
#include <signal.h>

void signalHandler(int signal){
    if(signal==SIGUSR1){
        fprintf(stderr,"child process received signal\n");
        kill(getppid(),SIGUSR1);
    }
}

int main(int argc,char ** argv){
    fprintf(stderr,"i'm the new child with pid %d\n",getpid());
    if(signal(SIGUSR1,signalHandler)==SIG_ERR){
        fprintf(stderr,"error");
    }
    while(1){}
}

When father is started, every 2 seconds it should kill child, fork itself and start a new child. The child is killed by sending him a SIGUSR1 signal, the signal is handled and forwarded to the father (with another SIGUSR1 signal) that kills the child with a SIGINT. The problem I am facing is that the second time the child is created, it does not receive the SIGUSR1 signal anymore. Can someone help?

EDIT:Thanks to @CraigEstey (see his answer below), he figured out that sending different signal from child to parent process does get the job done. Following this advice,the code I posted above should be changed like this (in order to make it work): in father.c replace if(signal==SIGUSR1) with if(signal==SIGUSR2) and signal(SIGUSR1,signalHandler); with signal(SIGUSR2,signalHandler);. In child.c replace kill(getppid(),SIGUSR1); with kill(getppid(),SIGUSR2);. If this is not what you want, I advise you to read @CraigEstey answer where he explains everything in detail and gives a working code using the same signal number for the 2 processes.


Solution

  • A few things ...

    When the child signals the parent, the parent is in the sleep call. This gets terminated early (e.g. EINTR).

    So, on the second round, the parent will kill the child [possibly] before the child is ready to receive the signal.

    I did some major hacking to add debug, waitpid, etc.

    The primary issue is that [either] the second child doesn't get a SIGUSR1 from the parent. Or, the child doesn't send the signal to the parent or the parent is blocked. From the second logfile below, it seems that the second child never gets the SIGUSR1 from the parent.

    I added additional signal calls to rearm the handlers [probably not needed]. I added waitpid calls

    I couldn't get things to work until I had parent and child use different signals. That is, parent sends SIGUSR1 to child and child sends SIGUSR2 to parent.

    That may not be what you want, and I don't [yet] see why the signal number difference matters.

    Looking at the signal masks in /proc/pid/status may help discern what may be going on.

    Edit: Got the code fully working. See the UPDATE section below for details.


    Here is the "bad" log. The parent will send the second signal before the child can do setup:

    SIGUSR1 is 10
    SIGUSR2 is 12
    SIGTERM is 15
    SIGINT is 2
    SIGDN is 10
    SIGUP is 10
    SIGFIX is 0
    KILLERR is 0
    QSLEEP is 0
    
    creating a new child
    created child pid is 738524
    i'm the new child with pid 738524
    while forever
    killing child with SIGDN 738524
    child got signal 10
    child killing parent 738523
    parent received signal 10
    killing child with SIGINT 738524
    waitpid on 738524
    
    creating a new child
    created child pid is 738525
    killing child with SIGDN 738525
    i'm the new child with pid 738525
    while forever
    killing child with SIGDN 738525
    killing child with SIGDN 738525
    killing child with SIGDN 738525
    killing child with SIGDN 738525
    

    Note that if we add the KILLERR option to the build, the kill call of SIGUSR1 from parent to child will produce an ESRCH error (No such process).


    If we have the parent's sleep call loop and wait the 2 seconds (e.g. it calculates remaining time on the sleep), we get a different sequence. The parent will only send SIGUSR1 after the child has had time to set up:

    SIGUSR1 is 10
    SIGUSR2 is 12
    SIGTERM is 15
    SIGINT is 2
    SIGDN is 10
    SIGUP is 10
    SIGFIX is 0
    KILLERR is 0
    QSLEEP is 1
    
    creating a new child
    created child pid is 739105
    i'm the new child with pid 739105
    while forever
    killing child with SIGDN 739105
    child got signal 10
    child killing parent 739104
    parent received signal 10
    killing child with SIGINT 739105
    waitpid on 739105
    
    creating a new child
    created child pid is 739106
    i'm the new child with pid 739106
    while forever
    killing child with SIGDN 739106
    killing child with SIGDN 739106
    killing child with SIGDN 739106
    killing child with SIGDN 739106
    

    Here is the working log:

    SIGUSR1 is 10
    SIGUSR2 is 12
    SIGTERM is 15
    SIGINT is 2
    SIGDN is 10
    SIGUP is 12
    SIGFIX is 1
    KILLERR is 0
    QSLEEP is 1
    
    creating a new child
    created child pid is 740214
    i'm the new child with pid 740214
    while forever
    killing child with SIGDN 740214
    child got signal 10
    child killing parent 740213
    parent received signal 12
    killing child with SIGINT 740214
    waitpid on 740214
    
    creating a new child
    created child pid is 740215
    i'm the new child with pid 740215
    while forever
    killing child with SIGDN 740215
    child got signal 10
    child killing parent 740213
    parent received signal 12
    killing child with SIGINT 740215
    waitpid on 740215
    
    creating a new child
    created child pid is 740216
    i'm the new child with pid 740216
    while forever
    killing child with SIGDN 740216
    child got signal 10
    child killing parent 740213
    parent received signal 12
    killing child with SIGINT 740216
    waitpid on 740216
    
    creating a new child
    created child pid is 740218
    i'm the new child with pid 740218
    while forever
    killing child with SIGDN 740218
    child got signal 10
    child killing parent 740213
    parent received signal 12
    killing child with SIGINT 740218
    waitpid on 740218
    
    creating a new child
    created child pid is 740219
    i'm the new child with pid 740219
    while forever
    

    UPDATE:

    Originally, I followed your comment that you used sigaction, so I didn't try that.

    But, I added sigaction as an option (along with a sigprocmask call to unblock the signal--which may be more important).

    This worked even with the signal numbers being the same.

    I've updated the code below but I've left the log files [above] from the previous post.


    Here is the source code. I added a common.c and Makefile:

    ==> child.c <==
    #include <stdio.h>
    #include <unistd.h>
    #include <signal.h>
    #include <string.h>
    
    #include "common.c"
    
    pid_t ppid;
    
    void
    signalHandler(int signo)
    {
        xsignal2(SIGDN, signalHandler);
    
        msg2("child got signal",signo);
    
        if (signo == SIGDN) {
            msg2("child killing parent",ppid);
            qkill(ppid, SIGUP);
        }
    }
    
    int
    main(int argc, char **argv)
    {
    
        ppid = getppid();
    
        xsignal(SIGDN, signalHandler);
        msg2("i'm the new child with pid",getpid());
    
        msg("while forever\n");
        while (1) {
        }
    }
    
    ==> common.c <==
    #include <time.h>
    #include <stdlib.h>
    #include <errno.h>
    
    typedef long long tsc_t;
    #define NSEC        1000000000
    
    #ifndef SIGFIX
    #define SIGFIX      0
    #endif
    
    #ifndef SIGACT
    #define SIGACT      0
    #endif
    
    #ifndef KILLERR
    #define KILLERR     0
    #endif
    
    #if SIGFIX
    #define SIGDN       SIGUSR1
    #define SIGUP       SIGUSR2
    #else
    #define SIGDN       SIGUSR1
    #define SIGUP       SIGUSR1
    #endif
    
    #ifndef QSLEEP
    #define QSLEEP      0
    #endif
    
    void
    xsignal(int signo,void (*fnc)(int))
    {
    
    #if SIGACT
        struct sigaction act;
        sigset_t set;
    
        memset(&act,0,sizeof(act));
        act.sa_handler = (void *) fnc;
        sigaction(signo,&act,NULL);
    
        sigemptyset(&set);
        sigaddset(&set,signo);
        sigprocmask(SIG_UNBLOCK,&set,NULL);
    #else
        signal(signo,fnc);
    #endif
    }
    
    void
    xsignal2(int signo,void (*fnc)(int))
    {
    
    #if ! SIGACT
        xsignal(signo,fnc);
    #endif
    }
    
    tsc_t
    tscget(void)
    {
        struct timespec ts;
        tsc_t tsc;
    
        clock_gettime(CLOCK_MONOTONIC,&ts);
    
        tsc = ts.tv_sec;
        tsc *= NSEC;
        tsc += ts.tv_nsec;
    
        return tsc;
    }
    
    void
    msg(const char *str)
    {
        size_t len = strlen(str);
        write(2,str,len);
    }
    
    void
    num(tsc_t val)
    {
        char *rhs;
        char *lhs;
        char buf[100];
    
        lhs = buf;
        rhs = buf;
    
        while (val > 0) {
            *rhs++ = (val % 10) + '0';
            val /= 10;
        }
    
        if (rhs <= buf)
            *rhs++ = '0';
    
        *rhs-- = 0;
    
        for (;  lhs < rhs;  ++lhs, --rhs) {
            int tmp = *lhs;
            *lhs = *rhs;
            *rhs = tmp;
        }
    
        msg(buf);
    }
    
    void
    msg2(const char *str,tsc_t val)
    {
    
        msg(str);
        msg(" ");
        num(val);
        msg("\n");
    }
    
    void
    qkill(pid_t pid,int signo)
    {
        int err;
    
        err = kill(pid,signo);
        if (err < 0) {
            err = errno;
    #if KILLERR
            msg2("qkill: failed -- err is",err);
            exit(1);
    #endif
        }
    }
    
    void
    qsleep(int sec)
    {
        tsc_t nsec;
        tsc_t beg;
        tsc_t now;
        struct timespec ts;
    
        nsec = sec;
        nsec *= NSEC;
    
        while (nsec > 0) {
            beg = tscget();
    
            ts.tv_nsec = nsec % NSEC;
            ts.tv_sec = nsec / NSEC;
            nanosleep(&ts,NULL);
    
            now = tscget();
            now -= beg;
    
            nsec -= now;
        }
    }
    
    ==> parent.c <==
    #include <stdio.h>
    #include <unistd.h>
    #include <signal.h>
    #include <stdlib.h>
    #include <string.h>
    #include <sys/wait.h>
    #include <syscall.h>
    
    #include "common.c"
    
    pid_t childPid;
    
    void
    createChild()
    {
    
        msg("\n");
        msg("creating a new child\n");
    
    #if 0
        childPid = fork();
    #else
        childPid = syscall(SYS_fork);
    #endif
        if (childPid == 0) {
            execlp("./child", "child", NULL);
            perror("err");
            exit(1);
        }
    
        msg("created child pid is ");
        num(childPid);
        msg("\n");
    }
    
    void
    signalHandler(int signo)
    {
    
        msg2("parent received signal",signo);
    
        if (signo == SIGUP) {
            xsignal2(SIGUP, signalHandler);
    
            msg2("killing child with SIGINT",childPid);
    #if 0
            qkill(childPid, SIGTERM);
    #else
            qkill(childPid, SIGKILL);
    #endif
    
            msg2("waitpid on",childPid);
            waitpid(childPid,NULL,0);
    
            createChild();
        }
    }
    
    int
    main(int argc, char **argv)
    {
    
        msg2("SIGUSR1 is",SIGUSR1);
        msg2("SIGUSR2 is",SIGUSR2);
        msg2("SIGTERM is",SIGTERM);
        msg2("SIGINT is",SIGINT);
        msg2("SIGDN is",SIGDN);
        msg2("SIGUP is",SIGUP);
    
        msg2("SIGFIX is",SIGFIX);
        msg2("SIGACT is",SIGACT);
        msg2("KILLERR is",KILLERR);
        msg2("QSLEEP is",QSLEEP);
    
        xsignal(SIGUP, signalHandler);
        signal(SIGCHLD, SIG_IGN);
    
        createChild();                      /* create first child */
    
        while (1) {
    #if QSLEEP
            qsleep(2);
    #else
            sleep(2);
    #endif
    
            msg2("killing child with SIGDN",childPid);
            int ret = kill(childPid, SIGDN);
    
            if (ret == -1) {
                perror("exit err ");
            }
    
            //msg2("waitpid on",childPid);
            //waitpid(childPid,NULL,0);
        }
    }
    
    ==> Makefile <==
    PGM += child
    PGM += parent
    
    all: $(PGM)
    
    $(PGM):
        cc -o $@ [email protected] $(CFLAGS) -Wall
    
    clean:
        rm -f $(PGM)