Search code examples
c++linuxtimerposix

Can POSIX timers safely modify C++ STL objects?


I'm attempting to write a C++ "wrapper" for the POSIX timer system on Linux, so that my C++ program can set timeouts for things (such as waiting for a message to arrive over the network) using the system clock, without dealing with POSIX's ugly C interface. It seems to work most of the time, but occasionally my program will segfault after several minutes of running successfully. The problem seems to be that my LinuxTimerManager object (or one of its member objects) gets its memory corrupted, but unfortunately the problem refuses to appear if I run the program under Valgrind, so I'm stuck staring at my code to try to figure out what's wrong with it.

Here's the core of my timer-wrapper implementation:

LinuxTimerManager.h:

namespace util {

using timer_id_t = int;

class LinuxTimerManager {
private:
    timer_id_t next_id;
    std::map<timer_id_t, timer_t> timer_handles;
    std::map<timer_id_t, std::function<void(void)>> timer_callbacks;
    std::set<timer_id_t> cancelled_timers;
    friend void timer_signal_handler(int signum, siginfo_t* info, void* ucontext);
public:
    LinuxTimerManager();
    timer_id_t register_timer(const int delay_ms, std::function<void(void)> callback);
    void cancel_timer(const timer_id_t timer_id);
};

void timer_signal_handler(int signum, siginfo_t* info, void* ucontext);
}

LinuxTimerManager.cpp:

namespace util {

LinuxTimerManager* tm_instance;

LinuxTimerManager::LinuxTimerManager() : next_id(0) {
    tm_instance = this;
    struct sigaction sa = {0};
    sa.sa_flags = SA_SIGINFO;
    sa.sa_sigaction = timer_signal_handler;
    sigemptyset(&sa.sa_mask);
    int success_flag = sigaction(SIGRTMIN, &sa, NULL);
    assert(success_flag == 0);
}

void timer_signal_handler(int signum, siginfo_t* info, void* ucontext) {
    timer_id_t timer_id = info->si_value.sival_int;
    auto cancelled_location = tm_instance->cancelled_timers.find(timer_id);
     //Only fire the callback if the timer is not in the cancelled set
    if(cancelled_location == tm_instance->cancelled_timers.end()) {
        tm_instance->timer_callbacks.at(timer_id)();
    } else {
        tm_instance->cancelled_timers.erase(cancelled_location);
    }
    tm_instance->timer_callbacks.erase(timer_id);
    timer_delete(tm_instance->timer_handles.at(timer_id));
    tm_instance->timer_handles.erase(timer_id);
}

timer_id_t LinuxTimerManager::register_timer(const int delay_ms, std::function<void(void)> callback) {
    struct sigevent timer_event = {0};
    timer_event.sigev_notify = SIGEV_SIGNAL;
    timer_event.sigev_signo = SIGRTMIN;
    timer_event.sigev_value.sival_int = next_id;

    timer_t timer_handle;
    int success_flag = timer_create(CLOCK_REALTIME, &timer_event, &timer_handle);
    assert(success_flag == 0);
    timer_handles[next_id] = timer_handle;
    timer_callbacks[next_id] = callback;

    struct itimerspec timer_spec = {0};
    timer_spec.it_interval.tv_sec = 0;
    timer_spec.it_interval.tv_nsec = 0;
    timer_spec.it_value.tv_sec = 0;
    timer_spec.it_value.tv_nsec = delay_ms * 1000000;
    timer_settime(timer_handle, 0, &timer_spec, NULL);

    return next_id++; 
}


void LinuxTimerManager::cancel_timer(const timer_id_t timer_id) {
    if(timer_handles.find(timer_id) != timer_handles.end()) {
        cancelled_timers.emplace(timer_id);
    }
}

}

When my program crashes, the segfault always comes from timer_signal_handler(), usually the lines tm_instance->timer_callbacks.erase(timer_id) or tm_instance->timer_handles.erase(timer_id). The actual segfault is thrown from somewhere deep in the std::map implementation (i.e. stl_tree.h).

Could my memory corruption be caused by a race condition between different timer signals modifying the same LinuxTimerManager? I thought only one timer signal was delivered at a time, but maybe I misunderstood the man pages. Is it just generally unsafe to make a Linux signal handler modify a complex C++ object like std::map?


Solution

  • The signal can occur in the middle of e.g. malloc or free and thus most calls which do interesting things with containers could result in reentering the memory allocation support while its data structures are in an arbitrary state. (As pointed out in the comments, most functions are not safe to call in asynchronous signal handlers. malloc and free are just examples.) Reentering a component in this fashion leads to pretty much arbitrary failure.

    Libraries cannot be made safe against this behavior without blocking signals for the entire process during any operations within the library. Doing that is prohibitively expensive, both in the overhead of managing the signal mask and in the amount of time signals would be blocked. (It has to be for the entire process as a signal handler should not block on locks. If a thread handling a signal calls into a library protected by mutexes while another thread holds a mutex the signal handler needs, the handler will block. It is very hard to avoid deadlock when this can happen.)

    Designs which work around this typically have a thread which listens for specific event and then does the processing. You have to use semaphores to synchronize between the thread and the signal handler.