Search code examples
clinuxunixposix

Does the OS (POSIX) finish a modification to a memory-mapped file if the process is SIGKILLed?


A similar post talks about if changes to a memory mapped file are flushed to disk after a SIGKILL, but what happens if the process is SIGKILLed in the middle of performing a change, e.g. write/delete, to the memory buffer before it is flushed to disk?

Does the underlying file get updated and corrupted? Is the write/delete operation finished before killing the process? Are there any safeguards for this?


Solution

  • Let's say you have something like

    volatile unsigned char  *map; /* memory-mapped file */
    size_t                   i;
    
    for (i = 0; i < 1000; i++)
        map[i] = slow_calculation(i);
    

    and for some reason, the process gets killed when i = 502.

    In such a case, the contents of the file will indeed reflect the content of the mapping at that point.

    No, there is no way to avoid this (with regards to the KILL signal), because KILL is unblockable and uncatchable.

    You can minimize the window by using a temporary buffer as a "transactional" buffer, calculating the new values to that buffer, and then just copy the values over. It is no guarantee, but it does mean there is a much higher probability that the file contents are intact even if the process is killed. (Furthermore, it means that if you use e.g. mutexes to synchronize access to the mapping, you only need to hold the mutex for the minimum amount of time.)

    Killing a process via the KILL signal is very abnormal termination, and having memory-mapped files garbled because of that is, in my opinion, expected. It is not something that should be done during normal operation at all; the TERM signal is used for that.

    What you should worry about, is that your process responds to a TERM signal in a timely fashion. TERM is catchable and blockable, and is basically a way for an external supervisor process (or user the process belongs to, or the superuser) to request the process exit cleanly as soon as possible. However, the process should not dally around, because it is quite common to send the process a KILL signal, if it doesn't exit within a few seconds after receiving a TERM signal.

    In my own daemons, I strive for them to respond to a TERM within a second or so, unless the system is under a heavy load. It is, of course, a very subjective measurement since the speed of different systems varies, but there are no hard and fast rules here.

    One way to handle this, is to install a TERM signal handler that in normal operation, does terminate the process immediately. For critical sections, the exit is postponed:

    static volatile int  in_critical = 0;
    static volatile int  need_to_exit = 0;
    
    static void handle_exit_signal(int signum)
    {
        __atomic_store_n(&need_to_exit, 1, __ATOMIC_SEQ_CST);
        if (!__atomic_load_n(&in_critical, __ATOMIC_SEQ_CST))
            exit(126);
    }
    
    static int install_exit(int signum)
    {
        struct sigaction  act;
        memset(&act, 0, sizeof act);
        sigemptyset(&act.sa_mask);
        act.sa_handler = handle_exit_signal;
        act.sa_flags = SA_RESTART;
        if (sigaction(signum, &act, NULL) == -1)
            return errno;
        return 0;
    }
    

    To enter and exit critical sections (say, when you hold a mutex within the shared memory region):

    static inline void critical_begin(void)
    {
        __atomic_add_fetch(&in_critical, 1, __ATOMIC_SEQ_CST);
    }
    
    static inline void critical_end(void)
    {
        if (!__atomic_sub_fetch(&in_critical, 1, __ATOMIC_SEQ_CST))
            if (__atomic_load_n(&need_to_exit, __ATOMIC_SEQ_CST))
                exit(126);
    }
    

    So, if a TERM signal is received while you are in a critical section (and critical_begin() and critical_end() do nest), the final call to critical_end() exits the process.

    Note that I used the GCC atomic built-ins for managing the flags atomically, without data races, even if the signal handler is executed in a different thread. I've found this the cleanest solution for , although it should work on other OSes too. (Other C compilers you can use in Linux, like clang and Intel CC, do support those, too.)

    So, in pseudocode, doing the slow 1000-element calculation as shown in the beginning, would then be

    volatile unsigned char  *map;
    unsigned char            cache[1000];
    size_t                   i;
    
    /* Nothing critical yet, we're just calculating new values... */
    for (i = 0; i < 1000; i++)
        cache[i] = slow_calculation(i);
    
    /* Update shared memory map. */
    critical_begin();
    /* pthread_mutex_lock() */
    memcpy(map, cache, 1000);
    /* pthread_mutex_unlock() */
    critical_end();
    

    If a TERM signal is delivered before the critical_begin(), the process is terminated then and there. If a TERM signal is delivered after that, but before the critical_end(), the call to critical_end() will terminate the process.

    This is just one pattern that can solve the underlying problem; there are others. The one with a single volatile sig_atomic_t done = 0; that the signal handler sets to nonzero, and the main processing loops check regularly, is even more common.

    As pointed out by R.. in a comment, the pointer used to refer to the memory map should be a pointer to volatile (i.e., volatile some_type *map) to stop the compiler from reordering the stores to the memory map.