macos swift parallel-processing grand-central-dispatch

Using Grand Central Dispatch in Swift to parallelize and speed up “for" loops?

I am trying to wrap my head around how to use GCD to parallelize and speed up Monte Carlo simulations. Most/all simple examples are presented for Objective C and I really need a simple example for Swift since Swift is my first “real” programming language.

The minimal working version of a monte carlo simulation in Swift would be something like this:

import Foundation

import Cocoa
var winner = 0
var j = 0
var i = 0
var chance = 0
var points = 0
for j=1;j<1000001;++j{
    var ability = 500

    var player1points = 0

    for i=1;i<1000;++i{
        chance = Int(arc4random_uniform(1001))
        if chance<(ability-points) {++points}
        else{points = points - 1}
    }
    if points > 0{++winner}
}
    println(winner)

The code works directly pasted into a command line program project in xcode 6.1

The innermost loop cannot be parallelized because the new value of variable “points” is used in the next loop. But the outermost just run the innermost simulation 1000000 times and tally up the results and should be an ideal candidate for parallelization.

So my question is how to use GCD to parallelize the outermost for loop?

Solution

A "multi-threaded iteration" can be done with dispatch_apply():

let outerCount = 100    // # of concurrent block iterations
let innerCount = 10000  // # of iterations within each block

let the_queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_apply(UInt(outerCount), the_queue) { outerIdx -> Void in
    for innerIdx in 1 ... innerCount {
       // ...
    }
}

(You have to figure out the best relation between outer and inner counts.)

There are two things to notice:

arc4random() uses an internal mutex, which makes it extremely slow when called from several threads in parallel, see Performance of concurrent code using dispatch_group_async is MUCH slower than single-threaded version. From the answers given there, rand_r() (with separate seeds for each thread) seems to be faster alternative.
The result variable winner must not be modified from multiple threads simultaneously. You can use an array instead where each thread updates its own element, and the results are added afterwards. A thread-safe method has been described in https://stackoverflow.com/a/26790019/1187415.

Then it would roughly look like this:

let outerCount = 100     // # of concurrent block iterations
let innerCount = 10000   // # of iterations within each block

let the_queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);

var winners = [Int](count: outerCount, repeatedValue: 0)
winners.withUnsafeMutableBufferPointer { winnersPtr -> Void in

    dispatch_apply(UInt(outerCount), the_queue) { outerIdx -> Void in
        var seed = arc4random() // seed for rand_r() in this "thread"

        for innerIdx in 1 ... innerCount {
            var points = 0
            var ability = 500

            for i in 1 ... 1000 {
                let chance = Int(rand_r(&seed) % 1001)
                if chance < (ability-points) { ++points }
                else {points = points - 1}
            }
            if points > 0 {
                winnersPtr[Int(outerIdx)] += 1
            }
        }
    }
}

// Add results:
let winner = reduce(winners, 0, +)
println(winner)