Search code examples
ckubernetesunixsignal-handling

C & Unix on Kubernetes - exit(0) doesn't work


I tried to run my C-program on my local kubernetes-cluster and on the first look all functions are working as I expect. Only my signal-handler for terminating one process doesn't work completely.

There are two processes, one for administration, one for the "work". The administration-process sends a SIGTERM to the worker-process and the signal-handler also catch this signal (I can see it in my logs). The only thing that does not work: terminating the worker-process.

Here the signal-handler:

void handle_sigterm() {
error_msg(sip_man_log, "(MAIN) INFO: Closing all sockets.");
close(connfd); 
close(sockfd); 
close(sockfd_ext);
free_manipulation_table(int_modification_table);
free_manipulation_table(ext_modification_table); 
free_manipulation_table(mir_modification_table);

error_msg(sip_man_log, "(MAIN) INFO: Terminating Server by SIGTERM"); 
exit(0);} 

In the logs the last message is visible but the process is still active so I think the "exit(0);" does not work correctly (on my mind the "exit(0)" should terminate the complete process).

If I try the same code on my local machine it works as I expect. I'm relative new in C and Unix-programming so I don't understand whats wrong here.


Solution

  • I found a solution for my problem. I think this was a kubernetes-specific thing. After some research I found out that the worker-process get the status Zombie after sending the SIGTERM and the parent-process was not able to clean up the Zombie. So after adding tini as init-process (on dockerfile and deployment.yaml) and make sure it gets PID1 now the zombies are cleaned up and the worker-process is terminating when I send the SIGTERM from the admin-process.

    I also tried to improve my signal-handler to make it safe again (thanks for your information in the comments). Now here the updated signal-handler and some parts of the rest of the program:

    volatile atomic_int execute_loop = 1; 
    
    void handle_sigterm() {
        error_msg(sip_man_log, "(MAIN) INFO: Signal reached.");
        execute_loop = 0; 
        error_msg(sip_man_log, "(MAIN) INFO: Closing all sockets.");
        close(sockfd_ext);
        close(connfd); 
        close(sockfd); 
    }
    
    int main()
    {
        struct sockaddr_in  sockaddr, connaddr, sockaddr_ext;
        unsigned int        connaddr_len;
        char                buffer[8192];
        int                 rv, rv_ext;
    
        signal(SIGTERM, handle_sigterm); 
        
        sockfd = socket(AF_INET, SOCK_STREAM, 0);
        [...]
    
        while(execute_loop)
        {
            connaddr_len = sizeof(connaddr);
            connfd = accept(sockfd, (struct sockaddr*)&connaddr, &connaddr_len);
        [...]
        }
        free_manipulation_table(int_modification_table);
        free_manipulation_table(ext_modification_table); 
        free_manipulation_table(mir_modification_table);
        error_msg(sip_man_log, "(MAIN) INFO: Terminating Server by SIGTERM"); 
    
        return 0;
    }
    

    It's very shortened and I skipped most of the error-handling. Thanks for your help!

    Dennis