There are several times in this book talk about deadlock
regarding to incorrect usage of goroutine
and channel
, and I failed to grasp why the deadlock happen.
First thing I want to say is that, I know channel send&receive will block until there is items to be read or rooms to send items into, and maybe that's the deep reason of some deadlock
. But please enlighten me with some explanation according to the following excerpt from the book:
This code is to concurrently crawl urls, BFS style:
func main() {
worklist := make(chan []string)
// Start with the command-line arguments.
go func() { worklist <- os.Args[1:] }()
// Crawl the web concurrently.
seen := make(map[string]bool)
for list := range worklist {
for _, link := range list {
if !seen[link] {
seen[link] = true
go func(link string) {
worklist <- crawl(link)
}(link)
}
}
}
}
quoting book's second paragraph:
...the initial send of the command-line arguments to the worklist must run in its own goroutine to avoid deadlock, a stuck situation in which both the main goroutine and a crawler goroutine attempt to send to each other while neither is receiving...
suppose the initial send to worklist
is not in its own goroutin, I imagine it works like this:
worklist
, block until receivedfor range
receive the initial item, so unblock the worklist
channelworklist
, loop...So to my understanding, it won't block and deadlock. Where am I wrong?
UPDATE: @mkopriva helped me realized since step 1 is blocked, step 2,3 is unreachable. So I'm clear on this one.
This deadlock situation might be the same as in page 240:
func main() {
worklist := make(chan []string) // list of URLs, may have duplicates
unseenLinks := make(chan string) // de-duplicated URLs
// Add command-lin arguments to worklist.
go func() { worklist <- os.Args[1:] }()
// Create 20 crawler goroutines to fetch each unseen link.
for i := 0; i < 20; i++ {
go func() {
for link := range unseenLinks {
foundLinks := crawl(link)
go func() { worklist <- foundLinks }()
}
}()
}
// The main goroutine de-duplicates worklist items
// and sends the unseen ones to the crawlers.
seen := make(map[string]bool)
for list := range worklist {
for _, link := range list {
if !seen[link] {
seen[link] = true
unseenLinks <- link
}
}
}
}
So if I put omit go
inside the for-range
loop, how the deadlock happen?
In the first snippet, the initial channel send needs to be done in a goroutine because without a goroutine the statement would block indefinitely and the execution would never reach the loop that receives from that channel. i.e. To get from 1.
to 2.
, 1.
needs to be done in a goroutine. If 1.
blocks however, then 2.
is never reached.
Where the comments start is where the execution stops:
func main() {
worklist := make(chan []string)
worklist <- os.Args[1:]
//
// seen := make(map[string]bool)
// for list := range worklist {
// for _, link := range list {
// if !seen[link] {
// seen[link] = true
// go func(link string) {
// worklist <- crawl(link)
// }(link)
// }
// }
// }
// }
In the second snippet you have a for-range
loop over a channel, such a loop will NOT exit until the ranged-over channel is closed. That means that, while the body of such a loop may continue to get executed, the code after the loop-with-unclosed-channel will never be reached.
https://golang.org/ref/spec#For_range
- For channels, the iteration values produced are the successive values sent on the channel until the channel is closed. If the channel is nil, the range expression blocks forever.
Where the comments start is where the execution stops:
func main() {
worklist := make(chan []string)
unseenLinks := make(chan string)
go func() { worklist <- os.Args[1:] }()
for i := 0; i < 20; i++ {
for link := range unseenLinks {
// foundLinks := crawl(link)
// go func() { worklist <- foundLinks }()
// }
// }
//
// seen := make(map[string]bool)
// for list := range worklist {
// for _, link := range list {
// if !seen[link] {
// seen[link] = true
// unseenLinks <- link
// }
// }
// }
// }