Search code examples
asynchronousgogoroutine

Why does my code work correctly when I run wg.Wait() inside a goroutine?


I have a list of urls that I am scraping. What I want to do is store all of the successfully scraped page data into a channel, and when I am done, dump it into a slice. I don't know how many successful fetches I will get, so I cannot specify a fixed length. I expected the code to reach wg.Wait() and then wait until all the wg.Done() methods are called, but I never reached the close(queue) statement. Looking for a similar answer, I came across this SO answer

https://stackoverflow.com/a/31573574/5721702

where the author does something similar:

ports := make(chan string)
toScan := make(chan int)
var wg sync.WaitGroup

// make 100 workers for dialing
for i := 0; i < 100; i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()
        for p := range toScan {
            ports <- worker(*host, p)
        }
    }()
}

// close our receiving ports channel once all workers are done
go func() {
    wg.Wait()
    close(ports)
}()

As soon as I wrapped my wg.Wait() inside the goroutine, close(queue) was reached:

urls := getListOfURLS()
activities := make([]Activity, 0, limit)
queue := make(chan Activity)
for i, activityURL := range urls {
    wg.Add(1)
    go func(i int, url string) {
        defer wg.Done()
        activity, err := extractDetail(url)
        if err != nil {
            log.Println(err)
            return
        }
        queue <- activity
    }(i, activityURL)
}
    // calling it like this without the goroutine causes the execution to hang
// wg.Wait() 
// close(queue)

    // calling it like this successfully waits
go func() {
    wg.Wait()
    close(queue)
}()
for a := range queue {
    // block channel until valid url is added to queue
    // once all are added, close it
    activities = append(activities, a)
}

Why does the code not reach the close if I don't use a goroutine for wg.Wait()? I would think that the all of the defer wg.Done() statements are called so eventually it would clear up, because it gets to the wg.Wait(). Does it have to do with receiving values in my channel?


Solution

  • You need to wait for goroutines to finish in a separate thread because queue needs to be read from. When you do the following:

    queue := make(chan Activity)
    for i, activityURL := range urls {
        wg.Add(1)
        go func(i int, url string) {
            defer wg.Done()
            activity, err := extractDetail(url)
            if err != nil {
                log.Println(err)
                return
            }
            queue <- activity // nothing is reading data from queue.
        }(i, activityURL)
    }
    
    wg.Wait() 
    close(queue)
    
    for a := range queue {
        activities = append(activities, a)
    }
    

    Each goroutine blocks at queue <- activity since queue is unbuffered and nothing is reading data from it. This is because the range loop on queue is in the main thread after wg.Wait.

    wg.Wait will only unblock once all the goroutine return. But as mentioned, all the goroutines are blocked at channel send.

    When you use a separate goroutine to wait, code execution actually reaches the range loop on queue.

    // wg.Wait does not block the main thread.
    go func() {
        wg.Wait()
        close(queue)
    }()
    

    This results in the goroutines unblocking at the queue <- activity statement (main thread starts reading off queue) and running until completion. Which in turn calls each individual wg.Done.

    Once the waiting goroutine get past wg.Wait, queue is closed and the main thread exits the range loop on it.