Search code examples
gogoroutinewaitgroup

Writing to chan within sync.WaitGroup goroutine


I am fetching a list of items from an API endpoint. Then for each item I make another API request to get data about the individual item.

I can't make the second API request for every item concurrently, because my API token has a rate limit and I'll get throttled if I make too many requests at the same time.

However the initial API response data can be split into pages, which allows me to process pages of data concurrently.

After doing some research, the code below does exactly what I want:

func main() {
    // pretend paginated results from initial API request
    page1 := []int{1, 2, 3}
    page2 := []int{4, 5, 6}
    page3 := []int{7, 8, 9}
    pages := [][]int{page1, page2, page3}

    results := make(chan string)

    var wg sync.WaitGroup
    for i := range pages {
        wg.Add(1)
        go func(i int) {
            defer wg.Done()
            for j := range pages[i] {
                // simulate making additional API request and building the report
                time.Sleep(500 * time.Millisecond)

                result := fmt.Sprintf("Finished creating report for %d", pages[i][j])
                results <- result
            }

        }(i)
    }

    go func() {
        wg.Wait()
        close(results)
    }()

    for result := range results {
        fmt.Println(result)
    }
}

I'd like to understand why this is what makes it work:

go func() {
    wg.Wait()
    close(results)
}()

My first try didn't work -- I thought I could range over the channel after wg.Wait() and I'd read the results as they were written to the results channel.

func main() {
    // pretend paginated results from initial API request
    page1 := []int{1, 2, 3}
    page2 := []int{4, 5, 6}
    page3 := []int{7, 8, 9}
    pages := [][]int{page1, page2, page3}

    results := make(chan string)

    var wg sync.WaitGroup
    for i := range pages {
        wg.Add(1)
        go func(i int) {
            defer wg.Done()
            for j := range pages[i] {
                // simulate making additional API request and building the report
                time.Sleep(500 * time.Millisecond)

                result := fmt.Sprintf("Finished creating report for %d", pages[i][j])
                results <- result
            }

        }(i)
    }

    // does not work
    wg.Wait()
    close(results)

    for result := range results {
        fmt.Println(result)
    }
}

Solution

  • In your first attempt:

    1. Main goroutine makes 3 goroutines to put values in the result channel.
    2. Main goroutine waits for all of them to finish.
    3. One of the goroutines puts a value in the result channel and fills the channel up (channel size 1 string).
    4. All three goroutine now cannot put value in to the result channel anymore and goes to sleep till the result channel is freed up.
    5. All goroutines are asleep. You have deadlock.

    In your second attempt:

    1. Main goroutine makes 4 goroutines.
    2. 3 goroutines put values in the result channel.
    3. Other goroutine (I'll refer to this as 4th) waits for these 3 to end.
    4. Meanwhile main goroutine waits for values in result channel (for loop)
    5. In this case if one of the goroutine puts a value in the result channel blocking the rest of the three goroutines; the main goroutine pulls the value out of the results channel, thus unblocking the other goroutines.
    6. So all the 3 goroutines put all their respective values and end
    7. Then the 4th goroutine closes the channel
    8. And main goroutine ends its for loop.