Search code examples
godeadlockgoroutine

A case of `all goroutines are asleep - deadlock!` I can't figure out why


TL;DR: A typical case of all goroutines are asleep, deadlock! but can't figure it out

I'm parsing the Wiktionary XML dump to build a DB of words. I defer the parsing of each article's text to a goroutine hoping that it will speed up the process.
It's 7GB and is processed in under 2 minutes in my machine when doing it serially, but if I can take advantage of all cores, why not.

I'm new to threading in general, I'm getting a all goroutines are asleep, deadlock! error.
What's wrong here?

This may not be performant at all, as it uses an unbuffered channel, so all goroutines effectively end up executing serially, but my idea is to learn and understand threading and to benchmark how long it takes with different alternatives:

  • unbuffered channel
  • different sized buffered channel
  • only calling as many goroutines at a time as there are runtime.NumCPU()

The summary of my code in pseudocode:

while tag := xml.getNextTag() {
    wg.Add(1)
    go parseTagText(chan, wg, tag.text)

    // consume a channel message if available
    select {
    case msg := <-chan:
        // do something with msg            
    default:
    }
}
// reading tags finished, wait for running goroutines, consume what's left on the channel
for msg := range chan {
    // do something with msg
}
// Sometimes this point is never reached, I get a deadlock
wg.Wait()

----

func parseTagText(chan, wg, tag.text) {
    defer wg.Done()
    // parse tag.text
    chan <- whatever // just inform that the text has been parsed
}

Complete code:
https://play.golang.org/p/0t2EqptJBXE


Solution

  • In your complete example on the Go Playground, you:

    • Create a channel (line 39, results := make(chan langs)) and a wait-group (line 40, var wait sync.WaitGroup). So far so good.

    • Loop: in the loop, sometimes spin off a task:

                  if ...various conditions... {
                      wait.Add(1)
                      go parseTerm(results, &wait, text)
                  }
      
    • In the loop, sometimes do a non-blocking read from the channel (as shown in your question). No problem here either. But...

    • At the end of the loop, use:

      for res := range results {
          ...
      }
      

      without ever calling close(results) in exactly one place, after all writers finish. This loop uses a blocking read from the channel. As long as some writer goroutine is still running, the blocking read can block without having the whole system stop, but when the last writer finishes writing and exits, there are no remaining writer goroutines. Any other remaining goroutines might rescue you, but there are none.

    Since you use the var wait correctly (adding 1 in the right place, and calling Done() in the right place in the writer), the solution is to add one more goroutine, which will be the one to rescue you:

    go func() {
        wait.Wait()
        close(results)
    }()
    

    You should spin off this rescuer goroutine just before entering the for res := range results loop. (If you spin it off any earlier, it might see the wait variable count down to zero too soon, just before it gets counted up again by spinning off another parseTerm.)

    This anonymous function will block in the wait variable's Wait() function until the last writer goroutine has called the final wait.Done(), which will unblock this goroutine. Then this goroutine will call close(results), which will arrange for the for loop in your main goroutine to finish, unblocking that goroutine. When this goroutine (the rescuer) returns and thus terminates, there are no more rescuers, but we no longer need any.

    (This main code then calls wait.Wait() unnecessarily: Since the for didn't terminate until the wait.Wait() in the new goroutine already unblocked, we know that this next wait.Wait() will return immediately. So we can drop this second call, although leaving it in is harmless.)