Vectorise a function taking advantage of concurrency

For a simple neural network I want to apply a function to all the values of a gonum VecDense.

Gonum has an Apply method for Dense matrices, but not for vectors, so I am doing this by hand:

func sigmoid(z float64) float64 {                                           
    return 1.0 / (1.0 + math.Exp(-z))
}

func vSigmoid(zs *mat.VecDense) {
    for i := 0; i < zs.Len(); i++ {
        zs.SetVec(i, sigmoid(zs.AtVec(i)))
    }
}

This seems to be an obvious target for concurrent execution, so I tried

var wg sync.WaitGroup

func sigmoid(z float64) float64 {                                           
    wg.Done()
    return 1.0 / (1.0 + math.Exp(-z))
}

func vSigmoid(zs *mat.VecDense) {
    for i := 0; i < zs.Len(); i++ {
        wg.Add(1)
        go zs.SetVec(i, sigmoid(zs.AtVec(i)))
    }
    wg.Wait()
}

This doesn't work, perhaps not unexpectedly, as Sigmoid() doesn't end with wg.Done(), as the return statement (which does all the work) comes after it.

My question is: How can I use concurrency to apply a function to each element of a gonum vector?

Solution

First note that this attempt to do computation concurrenty assumes that the SetVec() and AtVec() methods are safe for concurrent use with distinct indices. If this is not the case, the attempted solution is inherently unsafe and may result in data races and undefined behavior.

wg.Done() should be called to signal that the "worker" goroutine finished its work. But only when the goroutine finished its work.

In your case it is not (only) the sigmoid() function that is run in the worker goroutine, but rather zs.SetVec(). So you should call wg.Done() when zs.SetVec() has returned, not sooner.

One way would be to add a wg.Done() to the end of the SetVec() method (it could also be a defer wg.Done() at its beginning), but it wouldn't be feasible to introduce this dependency (SetVec() should not know about any wait groups and goroutines, this would seriously limit its usability).

The easiest and cleanest way in this case would be to launch an anonymous function (a function literal) as the worker goroutine, in which you may call zs.SetVec(), and in which you may call wg.Defer() once the above mentioned function has returned.

Something like this:

for i := 0; i < zs.Len(); i++ {
    wg.Add(1)
    go func() {
        zs.SetVec(i, sigmoid(zs.AtVec(i)))
        wg.Done()
    }()
}
wg.Wait()

But this alone won't work, as the function literal (closure) refers to the loop variable which is modified concurrently, so the function literal should work with its own copy, e.g.:

for i := 0; i < zs.Len(); i++ {
    wg.Add(1)
    go func(i int) {
        zs.SetVec(i, sigmoid(zs.AtVec(i)))
        wg.Done()
    }(i)
}
wg.Wait()

Also note that goroutines (although may be lightweight) do have overhead. If the work they do is "small", the overhead may outweight the performance gain of utilizing multiple cores / threads, and overall you might not gain performance by executing such small tasks concurrently (hell, you may even do worse than without using goroutines). Measure.

Also you are using goroutines to do minimal work, you may improve performance by not "throwing" away goroutines once they're done with their "tiny" work, but you may "reuse" them. See related question: Is this an idiomatic worker thread pool in Go?