Search code examples
linuxgosignalscoroutine

How can you avoid races in overriding go's default signal handlers?


tl;dr if a signal can be handled by the go runtime at any time, how can we safely use signal.Ignore to ignore SIGINT in a way that isnt a race between when the default signal handler is installed, and when our instruction inside main() runs

The go docs for pkg/signal states this about the default behavior of signals

A SIGHUP, SIGINT, or SIGTERM signal causes the program to exit.

So, write a golang binary that spins the CPU, hit ctrl+c to send SIGINT, and the program will exit.

Now, say you want to overwrite that behavior. One way to do it would be to ignore the signal with signal.Ignore(syscall.SIGINT)

But now consider the following

package main

import (
    "fmt"
    "os/signal"
    "syscall"
    "time"
)

func main() {
    fmt.Println("Looping, SIGINT will have default behavior")
    for start := time.Now(); time.Now().Before(start.Add(time.Second * 5)); {

    }
    signal.Ignore(syscall.SIGINT)
    fmt.Println("OK")

    for start := time.Now(); time.Now().Before(start.Add(time.Second * 5)); {

    }
}

Here we have a simple golang binary that loops for 5 seconds. It's a busy loop so we know we do not participate in co-operative multitasking by yielding time on the OS thread. Then it registers to ignore SIGINT.

If you try this you will notice that if you enter ctrl+c during the first 5 seconds, the program will exit. This seems to make sense - we are getting the default golang runtime signal handling behavior because we have not yet overridden it with our call to signal.Ignore.

Now maybe we could solve this problem by moving the signal.Ignore to be the first thing in main(), however what this program proves is that the go runtime provides no guarantees that the default signal handler won't run before your synchronous code in main() is finished executing.

Even if you move it, we seem to be in a race between

  1. The point at which the go runtime registers its default signal handlers, and
  2. The earliest point at which our code can run (the first (or second if you need to make a channel) instruction in main)

I can't find documentation on this. What guarantees does the go runtime provide to make me feel absolutely sure a signal cannot arrive between stages 1 and 2?


Solution

  • TL;DR: catch early, e.g., right at the top of main.


    As the comments said, this doesn't seem to be a good way to describe this as a problem, as it's quite generic across all programming languages: if you haven't set a signal handler for syscall.SIGINT, you'll get killed by default on SIGINT, and once you have, you won't. That's true, but any program can be killed during its startup, before it has a chance to start catching signals. It's true regardless of programming language. It affects C and C++ programs just as much as it affects Go programs.

    In general, then, all you need to do to catch-or-discard SIGINT signals reliably and as race-free as possible is to use signal.Notify(ch, os.Interrupt) very early, near the top of your main, where ch is a channel you make for this.1 You can then write your own race-free code to deal with this via goroutines and channels:

    • Have your own goroutine read the channel to see if/when a signal is delivered.
    • Have it read another channel, or some shared memory area using your own mutex management, to see how to handle the signal if and when one is delivered.
    • When a signal is delivered, if you should exit, call os.Exit (or—probably better—use signal.Reset on syscall.SIGINT and then deliver yourself a syscall.SIGINT to generate the right OS-level exit status, "killed by signal 1"2). If the signal should be ignored, simply drop the channel notification.

    Somewhat related: there was a pretty recent fix for syscall.SIGPIPE handling. In particular, calling signal.Ignore(syscall.SIGPIPE) should ignore SIGPIPE, but didn't. This seems to be fixed in Go 1.14.


    1As the package documentation notes, deliberately catching a signal will bypass any "pre-ignored" state from nohup or trap "" 1 2 15 in the shell. If you wish to check for this, use the signal.Ignored function.

    2If signal_unix.go exported the dieFromSignal function, you might be able to use that directly, but it doesn't. It would be nice to have an OS-agnostic wrapper to at least attempt this kind of suicide cleanly. This could even use sigprocmask at the OS level to make the suicide as race-free as possible.