ios multithreading concurrency threadpool grand-central-dispatch

Implement a thread pool using GCD

I have a big loop with a computational tasks which can be parallelised. For such purpose I decided to write a simple concurrent thread pool using GCD since I'm working on iOS.

My thread pool looks fairly simple. I will attach only .m file, it will be enough to understand my idea:

#import "iOSThreadPool.h"

@interface iOSThreadPool()
{
    int                                     _timeout;
    int                                     _currentThreadId;
    NSMutableArray<dispatch_queue_t>        *_pool;
    NSMutableArray<dispatch_semaphore_t>    *_semaphores;
    dispatch_group_t                        _group;
}

@end

@implementation iOSThreadPool

- (instancetype)initWithSize:(int)threadsCount tasksCount:(int)tasksCount
{
    self = [super init];
    if (self) {
        _timeout = 2.0;
        _currentThreadId = 0;
        _pool = [NSMutableArray new];
        _semaphores = [NSMutableArray new];
        for (int i = 0; i < threadsCount; i++) {
            dispatch_queue_attr_t attr = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_CONCURRENT, QOS_CLASS_BACKGROUND, 0);
            dispatch_queue_t queue = dispatch_queue_create([NSString stringWithFormat:@"com.workerQueue_%d", i].UTF8String, attr);
            [_pool addObject:queue];

            dispatch_semaphore_t sema = dispatch_semaphore_create(tasksCount);
            [_semaphores addObject:sema];
        }

        _group = dispatch_group_create();
    }

    return self;
}

- (void)async:(iOSThreadPoolBlock)block
{
    dispatch_group_enter(self->_group);

    __block dispatch_semaphore_t sema = _semaphores[_currentThreadId];
    dispatch_async(_pool[_currentThreadId], ^{

        dispatch_semaphore_wait(sema, dispatch_time(DISPATCH_TIME_NOW, (int64_t)(self->_timeout * NSEC_PER_SEC)));
        block();
        dispatch_semaphore_signal(sema);

        dispatch_group_leave(self->_group);
    });

    _currentThreadId = (_currentThreadId + 1) % _pool.count;
}

- (void)wait {
    dispatch_group_wait(_group, dispatch_time(DISPATCH_TIME_NOW, (int64_t)(self->_timeout * NSEC_PER_SEC)));
}

@end

So, basically when I create a thread pool I'm setting threads count and semaphore values. Since queues are concurrent I want to limit tasks count which can be executed concurrently so thread would not be overwhelmed.

The thing is - no matter how much threads I'm creating, it doesn't affect the performance at all. I guess it happens because every Dispatch Queue tasks ends up in the global queue, and no matter how many queues I have, they all send their tasks to the same BACKGROUND queue most of the time.

I've read a lot about GCD and used it a lot in my practice successfully. But when I just want to go above a simple use which you can find in countless tutorials, like perform few parallelised processes with the intention to save as much execution time as possible - I fail. And I searched for more detailed explanation or more detailed efficient techniques for GCD, I found nothing. It looks like 90% of the time it's used in a very simple way. And at the same time I hear that GCD is very very powerful multithreading framework, so clearly, I just don't know how to use it properly.

So my question is - is this really possible to launch few parallelised processes on iOS? What should I change in my thread pool to make it efficient?

NOTE: I downloaded a C++ version of ThreadPool based on std::thread. And if I change threads count in this pool, I clearly see a performance bump. I would highly appreciate if some GCD guru can point me how to use GCD at it's maximum capacity.

Solution

GCD already does thread pooling (dispatch queues are drawing upon a pool of “worker threads”), so it’s redundant/inefficient to add another layer of pooling on top of that.

You say:

The thing is - no matter how much threads I'm creating, it doesn't affect the performance at all.

That could be any of a number of things. One common problem includes that the unit of work is too small. As Performing Loops Concurrently says:

You should make sure that your task code does a reasonable amount of work through each iteration. As with any block or function you dispatch to a queue, there is overhead to scheduling that code for execution. If each iteration of your loop performs only a small amount of work, the overhead of scheduling the code may outweigh the performance benefits you might achieve from dispatching it to a queue.

But there are a variety of other problems ranging from inefficient synchronization code, cache sloshing, etc. It is impossible to say without a reproducible example of the problem. While QoS also has an impact, it is often negligible in comparison to these algorithmic issues.

You say:

Since queues are concurrent I want to limit tasks count which can be executed concurrently so thread would not be overwhelmed.

While you can achieve this with either non-zero dispatch semaphores or NSOperationQueue with some maxConcurrentOperationCount, the dispatch_apply (known as concurrentPerform for Swift users) is a “go to” solution for computationally-intensive, parallelized routines that balance workloads across CPU cores. It automatically looks at how many cores you’ve got, and distributes the loop across them, not risking an explosion in threads. And, as outlined in Improving on Loop Code, you can experiment with strides that do a good job balancing the amount of work done on each thread with the inherent overhead of the thread coordination. (Striding can also minimize cache contention.)

I might suggest researching dispatch_apply and giving it a try. If you’re still unclear at that point, just post a new question that shows both the non-parallel routine and the parallelized rendition, and we can help further.

As I’ve said above, I don’t think you want this routine at all. For computationally intensive routines, I would favor dispatch_apply. For simple queues for which I would want to control the degree of concurrency (especially if some of those tasks are, themselves asynchronous), I’d use NSOperationQueue with a maxConcurrentOperationCount. But I thought I’d share a few observations on your code snippet:

What you’ve implemented is a pool of queues, not a pool of threads;
What you’re calling threadsCount is not a count of threads, but rather a count of queues. So, if you create a pool with a count of 10 and tasksCount of 20, that means that you’re potentially using 200 threads.
Likewise what you’re calling _currentThreadId is not the current thread. It’s the current queue.
The interaction with _currentThreadId is not thread-safe.

Bottom line, GCD has its own pool of threads, so you shouldn’t reproduce that logic. All you need to do is to implement the “not more than threadCount” logic (which can be achieved with the non-zero dispatch semaphore). Thus, I’d suggest simplifying this to something like:

@interface ThreadPool()
@property (nonatomic, strong) dispatch_queue_t pool;
@property (nonatomic, strong) dispatch_queue_t scheduler;
@property (nonatomic, strong) dispatch_semaphore_t semaphore;
@end

@implementation ThreadPool

- (instancetype)initWithThreadCount:(int)threadCount {
    self = [super init];
    if (self) {
        NSString *identifier = [[NSUUID UUID] UUIDString];
        NSString *bundleIdentifier = [[NSBundle mainBundle] bundleIdentifier];

        NSString *schedulingLabel = [NSString stringWithFormat:@"%@.scheduler.%@", bundleIdentifier, identifier];
        _scheduler = dispatch_queue_create(schedulingLabel.UTF8String, DISPATCH_QUEUE_SERIAL);

        NSString *poolLabel = [NSString stringWithFormat:@"%@.pool.%@", bundleIdentifier, identifier];

        dispatch_queue_attr_t attr = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_CONCURRENT, QOS_CLASS_BACKGROUND, 0);
        _pool = dispatch_queue_create(poolLabel.UTF8String, attr);

        _semaphore = dispatch_semaphore_create(threadCount);
    }

    return self;
}

- (void)async:(ThreadPoolBlock)block {
    dispatch_async(self.scheduler, ^{
        dispatch_semaphore_wait(self.semaphore, DISPATCH_TIME_FOREVER);
        dispatch_async(self.pool, ^{
            block();
            dispatch_semaphore_signal(self.semaphore);
        });
    });
}

@end

Needless to say, this implementation, like yours, assumes that the block passed to async method is, itself, synchronous (e.g. it’s not starting yet another asynchronous process like a network request or whatever). I suspect you know that, but I only mention it for the sake of completeness.