Search code examples
iosswiftgrand-central-dispatchsemaphoredispatch-async

Max values of semaphore?


For example, there is a 1000 times loop. What's the max value to make it fast, effective, and not lead to a deadlock?

let group = DispatchGroup()
let queue = DispatchQueue(label: "com.num.loop", attributes: .concurrent)
let semaphore = DispatchSemaphore(value: 4)
for i in 1...1000 {
    semaphore.wait()
    group.enter()
    queue.async(group: group, execute: {
        doWork(i)                                    
        group.leave()
        semaphore.signal()
    })            
}

group.notify(queue: DispatchQueue.main) {
    // go on...
}

Solution

  • A couple of observations:

    1. You never want to exceed the maximum number of GCD worker threads per QoS. If you exceed this, you may experience blocking within your app. The last I checked, this limit was 64 threads.

    2. That having been said, there’s generally little benefit in exceeding the number of cores on your device.

    3. Often, we would let GCD figure out the maximum number of concurrent threads for us using concurrentPerform, which is automatically optimized for the device. It also eliminates the need for any semaphores or groups, often leading to less cluttered code:

      DispatchQueue.global().async {
          DispatchQueue.concurrentPerform(iterations: 1000) { i in
              doWork(i)                                    
          }
      
          DispatchQueue.main.async {
              // go on...
          }
      }
      

      The concurrentPerform will run the 1,000 iterations in parallel, but limiting the number of concurrent threads to a level appropriate for your device, eliminating the need for the semaphore. But concurrentPerform is, itself, synchronous, not proceeding until all iterations are done, eliminating the need for the dispatch group. So, dispatch the whole concurrentPerform to some background queue, and when it is done, just perform your “completion code” (or, in your case, dispatch that code back to the main queue).

    4. While I’ve argued for concurrentPerform above, that only works if doWork is performing its task synchronously (e.g. some compute operation). If it is initiating something that is, itself, asynchronous, then we have to fall back to this semaphore/group technique. (Or, perhaps better, use asynchronous Operation subclasses with a queue with reasonable maxConcurrentOperationCount or Combine flatMap(maxPublishers:_:) with reasonable limit on the count).

      Regarding reasonable threshold value in this case, there’s no magical number. You just have to perform some empirical tests, to find reasonable balance between number of cores and what else might be going on within your app. For example, for network requests, we often use 4 or 6 as a maximum count, not only considering the diminished benefit in exceeding that count, but also the implications of the impact on our server if thousands of users happened to be submitting too many concurrent requests at the same time.

    5. In terms of “making it fast”, the choice of “how many iterations should be allowed to run concurrently” is only part of the decision-making process. The more critical issue quickly becomes ensuring that doWork does enough work to justify the modest overhead introduced by the concurrent pattern.

      For example, if processing an image that is 1,000×1,000 pixels, you could perform 1,000,000 iterations, each processing one pixel. But if you do that, you might find that it is actually slower than your non-concurrent rendition. Instead, you might have 1,000 iterations, each iteration processing 1,000 pixels. Or you might have 100 iterations, each processing 10,000 pixels. This technique, called “striding”, often requires a little empirical research to find the right balance between how many iterations one will perform and how much work is done on each. (And, by the way, often this striding pattern can also prevent cache sloshing, a scenario that can arise if multiple threads contend for adjacent memory addresses.)

    6. Related to the prior point, we often want these various threads to synchronize their access to shared resources (to keep it thread-safe). That synchronization can introduce contention between these threads. So you will want to think about how and when you do this synchronization.

      For example, rather than having multiple synchronizations within doWork, you might have each iteration update a local variable (where no synchronization is needed) and perform the synchronized update to the shared resource only when the local calculations are done. It is hard to answer this question in the abstract, as it will depend largely upon what doWork is doing, but it can easily impact the overall performance.