Search code examples
goruntimescheduler

What is the relationship between gomaxprocs and process?


In method sync_runtime_canSpin of file /usr/local/go/src/runtime/proc.go, why the comment corresponding to gomaxprocs <= int32(sched.npidle+sched.nmspinning)+1 is GOMAXPROCS>1?Shouldn't sched.npidle+sched.nmspinning equal gomaxprocs?What is the function of this condition?


Solution

  • Recall that the purpose of active spinning in the implementation of Go's sync.Mutex type is to minimise the inefficiency associated with swapping goroutines when lock contention is discovered while attempting to acquire the mutex.

    This contrasts with a fully cooperative mutex whereby contention encountered when attempting to acquire it will cause the goroutine to immediately yield to the scheduler, which may find some other work to do. This was the behaviour of Go prior to version 1.5. The problem is that scheduling other work is expensive, and may result in cache misses and cache coherency traffic, which slows overall progress of the program.

    In Go 1.5, a blended approach was introduced. Where applicable, the goroutine will actively spin for a few cycles before yielding. Spinning may increase scheduling efficiency if the lock is released by the other goroutine soon, as it means the acquirer can immediately acquire the lock without the program incurring the overhead of scheduling another goroutine.

    why the comment corresponding to gomaxprocs <= int32(sched.npidle+sched.nmspinning)+1 is GOMAXPROCS>1

    With this background in mind, the Go sync.Mutex implementation calls sync_runtime_canSpin when the mutex cannot be immediately locked due to another goroutine holding it (ref). The result determines whether the goroutine should busy-wait, holding the CPU for some cycles to see if the lock is released soon. This does not occur if the mutex is starved, meaning it has not been acquired for some extended period (further discussion omitted here as not relevant).

    The purpose of the conditional check (gomaxprocs <= int32(sched.npidle+sched.nmspinning)+1) in this case is two-fold:

    1. it ensures that GOMAXPROCS is greater than 1
    2. it ensures that there is at least one other running P

    The first condition is necessary for spinning. If GOMAXPROCS == 1, only one goroutine can be scheduled at a time, and spinning simply occupies the CPU doing useless work that prevents the goroutine holding the lock from being scheduled. This simply delays lock release and wastes CPU time with no overall progress.

    The second condition ensures that some other P is running. If another P (hence, another goroutine) is executing, this provides the heuristic that the lock may be released soon – within the time that would be spent busy-waiting – and thus spinning may be worthwhile to see if that occurs. If no other P is running, all other goroutines may be blocked in system calls or on timers, and so the heuristic decision is that they will not return in a time that makes spinning worthwhile.

    As to GOMAXPROCS>1, the short answer is that this is implied by the second condition being true: there cannot be two running Ps if GOMAXPROCS=1. I have provided a full derivation of the condition below so you can see the thought process.


    Deriving the inequality

    Consider that a P may be in one of two states:

    • idle, i.e. the P is not currently executing any goroutine
    • running, i.e. the P is actively executing Go code in a goroutine

    Additionally, an M may be in spinning state if it is currently active, occupying CPU time while looking for work to do (ref). In such a case, it holds a P and so that P is not currently actively running Go code.

    The number of Ps is, by definition, equal to GOMAXPROCS: |P| == GOMAXPROCS. Call:

    P-idle     := sched.npidle
    M-spinning := sched.nmspinning
    

    We have that Ps can either be idle, running or assigned to an M that is spinning:

    |P| == GOMAXPROCS == P-idle + P-running + M-spinning

    If we want at least one other P to be working, we require P-running > 1, or in other words:

    • one P is the goroutine attempting to acquire the lock (the one inside sync_runtime_canSpin)
    • at least one other P is actively running some other Go code, which may be the goroutine holding the lock and may result in the lock being released soon.

    Thus, basic substitution and mathematical rearrangement yields:

    GOMAXPROCS - P-idle - M-spinning == P-running > 1
    GOMAXPROCS > P-idle + M-spinning + 1
    

    To invert the condition to identify cases where we should not spin (i.e. no other P is running), > becomes <= and we have GOMAXPROCS <= P-idle+M-spinning+1.

    In respect of the requirement that GOMAXPROCS > 1, suppose for purposes of argument that GOMAXPROCS = 1 and that the condition is true. Then this implies that P-running > 1, which is impossible, as there can only be one P. A logical contradiction. Thus, if the condition is true, GOMAXPROCS > 1.


    The other conditions in the if statement

    For completeness, the other two terms in the if statement conditional are:

    i >= active_spin implements the limited spinning behaviour – if a predetermined number of spin attempts do not result in the lock being acquired, no further spinning is performed to allow the worker thread (and CPU) to be yielded to other useful work in the program.

    ncpu <= 1 ensures a multicore system is used. On a single-core system, spinning will occupy all CPU time and so the goroutine holding the lock will not make progress towards releasing it during the spin (effectively similar to the case of GOMAXPROCS=1)

    and the subsequent if statement (p := getg().m.p; p.runqhead != p.runqtail) validates that the local P does not have additional runnable work that could be usefully performed instead.


    A note on heuristics

    As with all heuristics, this model attempts to approximate the most efficient execution model based on limited information available to the runtime and the impossibility of knowing exactly when the lock will be given up.

    Certain programs or interleavings of execution will, of course, result in suboptimal execution behaviour, but the intent of the optimisation is to speed up the common case when lock contention is high, critical sections short, and typically in single-producer/single-consumer cases, without significantly impairing the performance of the program and efficiency of CPU usage in other cases.

    The requirement that there is "some other active P" does not guarantee that the goroutine running in that P at the time we decide to spin is the one holding the lock, nor does it guarantee that the lock will be released soon. It is simply a heuristic that indicates the program may make progress and that the efficiency trade-off between yielding vs. waiting for the lock to be released merits spinning for some time in this case.

    References