Search code examples
goconcurrencygoroutine

How to check for goroutine completion while actively reading from channel?


Had a hard time expressing this question in a sentence. Here is the situation:

I'm trying to spawn off a set of goroutines to recurse over directories and find matching files. Then I collect those files and continue on to process them. However, the catch is that I don't know how many files each routine will find, so I'm having a hard time figuring out how to get the main thread to exit once all the routines are done

I could just make the channel buffer crazy big but that's not a good solution, this tool doesn't need to be 100% robust but good enough where it's not breaking all the time. Plus there's a chance it could turn up a lot of files

// start a routine to traverse each directory
fpchan := make(chan string, 100)
for _, dir := range numDirs {
    fmt.Printf("Searching for file in %s\n", dir)
    go findLogs(searchString, dir, fpchan)
}

// collect filepaths from channel
files := make([]string, 0, maxLogs)
for file := range fpchan { // I'LL GET STUCK WHEN EVERYTHING COMPLETES, NOTHING TO RECEIVE
    if len(files) <= cap(files) {
        files.append(file)
    } else {
        fmt.Println("Reached max logfile count of %d\n", maxLogs)
    }
}

A waitgroup doesn't really work because the channel could fill up and the routines would be stuck (since I don't know how many results there will be ahead of time)

Is it kosher to send an empty string on the channel as a way for the goroutine to signal "complete"? Like the following:

// collect filepaths from channel
files := make([]string, 0, maxLogs)
for file := range fpchan {
    if file == "" {
        fmt.Println("goroutine finished")
        numRunning--
        if numRunning == 0 {
             break
        }
        continue
    }
    if len(files) <= cap(files) {
        files.append(file)
    } else {
        fmt.Println("Reached max logfile count of %d\n", maxLogs)
    }
}

For this situation since an empty filepath would be invalid, it would work as a poor man's signal. It just feels like a hack for which there should be a better solution I would think

Any way to do this in a way that isn't horribly complicated (bunch of extra channels, non-blocking receives, etc)? If it's much more complicated than this I'd just do them in sequence but I thought it'd be a good chance to take advantage of concurrency


Solution

  • You can use a WaitGroup here. But you need to wait for the waitgroup in a goroutine, and in that goroutine you close the channel when the waitgroup has completed, so that your main loop terminates:

    var wg sync.WaitGroup
    // start a routine to traverse each directory
    fpchan := make(chan string, 100)
    for _, dir := range numDirs {
        fmt.Printf("Searching for file in %s\n", dir)
        wg.Add(1)
        go func(dir string) {
            findLogs(searchString, dir, fpchan)
            wg.Done()
        }(dir)
    }
    
    go func() {
        wg.Wait()
        close(fpchan)
    }()
    

    (your collection loop over fpchan remains the same)