Search code examples
c++exceptionerror-handlingsegmentation-faultsignals

How to recover from segmentation fault on C++?


I have some production-critical code that has to keep running.

think of the code as

while (true){
   init();
   do_important_things();  //segfault here
   clean();
}

I can't trust the code to be bug-free, and I need to be able to log problems to investigate later.

This time, I know for a fact somewhere in the code there is a segmentation fault getting thrown, and I need to be able to at least log that, and then start everything over.

Reading here there are a few solutions, but following each one is a flame-war claiming the solution will actually do more harm than good, with no real explanation. I also found this answer which I consider using, but I'm not sure it is good for my use case.

So, what is the best way to recover from segmentation fault on C++?


Solution

  • I suggest that you create a very small program that you make really safe that monitors the buggy program. If the buggy program exits in a way you don't like, restart the program.

    Posix example:

    #include <unistd.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    
    #include <cstdio>
    #include <iostream>
    
    int main(int argc, char* argv[]) {
        if(argc < 2) {
            std::cerr << "USAGE: " << argv[0] << " program_to_monitor <arguments...>\n";
            return 1;
        }
    
        while(true) {
            pid_t child = fork();          // create a child process
    
            if(child == -1) {
                std::perror("fork");
                return 1;
            }
    
            if(child == 0) {
                execvp(argv[1], argv + 1); // start the buggy program
                perror(argv[1]);           // starting failed
                std::exit(0);              // exit with 0 to not trigger a retry
            }
    
            // Wait for the buggy program to terminate and check the status
            // to see if it should be restarted.
    
            if(int wstatus; waitpid(child, &wstatus, 0) != -1) {
                if(WIFEXITED(wstatus)) {
                    if(WEXITSTATUS(wstatus) == 0) return 0; // normal exit, terminate
    
                    std::cerr << argv[0] << ": " << argv[1] << " exited with "
                              << WEXITSTATUS(wstatus) << '\n';
                }
                if(WIFSIGNALED(wstatus)) {
                    std::cerr << argv[0] << ": " << argv[1]
                              << " terminated by signal " << WTERMSIG(wstatus);
                    if(WCOREDUMP(wstatus)) std::cout << " (core dumped)";
                    std::cout << '\n';
                }
                std::cout << argv[0] << ": Restarting " << argv[1] << '\n';
            } else {
                std::perror("wait");
                break;
            }
        }
    }