Search code examples
goconcurrencygoroutine

waiting for all go routines to finish


First time with go, and trying to get go routines and WaitGroups working.

I have a CSV file with 100 rows of data. (101 including header)

I have the following simple code:

package main

import (
    "bufio"
    "fmt"
    "io"
    "os"
    "sync"
    "time"
)

func main() {
    start := time.Now()
    numRows := 0

    waitGroup := sync.WaitGroup{}
    file, _ := os.Open("./data.csv")

    scanner := bufio.NewScanner(file)
    scanner.Scan() // to read the header

    for scanner.Scan() {
        err := scanner.Err()

        if err != nil && err != io.EOF {
            panic(err)
        }

        waitGroup.Add(1)

        go (func() {
            numRows++
            waitGroup.Done()
        })()
    }

    waitGroup.Wait()
    file.Close()

    fmt.Println("Finished parsing ", numRows)
    fmt.Println("Elapsed time in seconds: ", time.Now().Sub(start))
}

When i run this, the numRows output fluctuates between 94 and 100 each time. I'm expecting it to be 100 each time. If i run the same code on a CSV of 10 rows of data, it outputs 10 each and every time.

Seems to me like the final few go routines aren't finishing in time.

I've tried the following which have failed:

  • using a CsvReader instead of a Scanner
  • moving waitGroup.Add(1) to underneath the anonymous func
  • moving the anonymous func out into a package-level scope func (and passed things round using ptrs)

What am i missing?


Solution

  • What do you do with this code:

    for scanner.Scan() {
        err := scanner.Err()
    
        if err != nil && err != io.EOF {
            panic(err)
        }
    
        waitGroup.Add(1)
    
        go (func() {
            numRows++
            waitGroup.Done()
        })()
    }
    

    Really all the work is done in one main goroutine and only numRows increment uses separate goroutines. I think this could be simplified to simple increment:

    for scanner.Scan() {
        err := scanner.Err()
    
        if err != nil && err != io.EOF {
            panic(err)
        }
        numRows++
    }
    

    If you want to simulate parallel parsing and pipelining you may use channels. Make only one goroutine responsible for counter increment. Every time when another goroutine wants to increment counter - it sends a message to that channel.

    https://play.golang.org/p/W60twJjY8P