Search code examples
goconcurrencylockingmutexgoroutine

GO Cond - fmt.Println after wg.Done ended up dead lock


Unable to understand this dead lock situation in golang, i have below to go code with pub and sub pattern

package main

import (
    "fmt"
    "sync"
)

func main() {
    cond := sync.NewCond(&sync.Mutex{})

    subscribe := func(c *sync.Cond, fn func()) {
        var goroutineRunning sync.WaitGroup
        goroutineRunning.Add(1)

        go func() {
            goroutineRunning.Done()
            fmt.Println("waiting")

            c.L.Lock()
            c.Wait()
            c.L.Unlock()

            fn()
        }()

        goroutineRunning.Wait()
    }

    var clickRegistered sync.WaitGroup
    clickRegistered.Add(2)

    subscribe(cond, func() {
        fmt.Println("notified 1")
        clickRegistered.Done()
    })

    subscribe(cond, func() {
        fmt.Println("notified 2")
        clickRegistered.Done()
    })

    cond.Broadcast()
    clickRegistered.Wait()
}

I couldn't able to understand, why i am getting dead lock if i put fmt.Println("waiting") after goroutineRunning.Done(), if i remove the fmt.Println("waiting") or move above goroutineRunning.Done() its working as expected, why it is happening like this?

Need to run go run main.go multiple times to get the dead lock it may work at first time.


Solution

  • In your code there are a few possible situations at the time cond.Broadcast() is called:

    • Both goroutines are blocked at c.Wait(), and are released.
    • Either (or both) goroutines are not yet blocked at c.Wait().

    cond.Broadcast() wakes all goroutines waiting on cond but, as per the above, it is possible that one (or both) goroutines have not yet reached that point in their execution (so are not waiting on cond, and will not be released). Should that happen they will block forever at c.Wait() as there are no further calls to cond.Broadcast().

    So what we have here is a race; if both goroutines reach c.Wait() before the main goroutine gets to cond.Broadcast() the program completes. If either does not make it, the program deadlocks (main goroutine blocked at clickRegistered.Wait() and 1 or 2 goroutines blocked at c.Wait(). The result of this is that the behaviour of the program is unpredictable and depends upon things like your OS, Go version, CPU architecture, current load etc.

    There are some things you can do that may influence the behaviour. For example adding a call to runtime.GOMAXPROCS(1) at the start of your program will probably (it's not guaranteed!) mean it completes (even with the fmt.Println). This is because when running with one CPU it is likely that the Go scheduler will allow each goroutine to run to c.Wait() (this is not guaranteed; the scheduler may prempt execution - also). With multiple CPU cores the goroutines will (probably) be running across multiple cores so things become less predictable.

    Adding code above goroutineRunning.Done() will have no impact because the main goroutine cannot proceed beyond the goroutineRunning.Wait() before the goroutines call goroutineRunning.Done(). However, any delay (such as a call to fmt.Println("waiting")) after that point increases the chance that the goroutine will not reach c.Wait() before the main goroutine reaches cond.Broadcast() (leading to the deadlock).