Search code examples
objective-cmultithreadinggrand-central-dispatch

dispatch_apply leaves one thread "hanging"


I am experimenting with multithreading following Apples Concurrency Programming Guide. The multithreaded function (dispatch_apply) replacing the for-loop seems straightforward and works fine with a simple printf statement. However, if the block calls a more cpu-intensive calculation, the program never ends or executes past dispatch_apply, and one thread (main thread?) seems stuck at 100%.

#import <Foundation/Foundation.h>

#define THREAD_COUNT 16

unsigned long long longest = 0;
unsigned long long highest = 0;

void three_n_plus_one(unsigned long step);

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        
        dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
        dispatch_apply(THREAD_COUNT, queue, ^(size_t i) {
            three_n_plus_one(i);
        });
    }
    return 0;
}

void three_n_plus_one(unsigned long step) {
    
    unsigned long long start = step;
    unsigned long long end = 1000000;
    unsigned long long identifier = 0;

    while (start <= end) {

        unsigned long long current = start;
        unsigned long long sequence = 1;

        while (current != 1) {
        
            sequence += 1;
            if(current % 2 == 0)
                current = current / 2;
            else {
                current = (current * 3) + 1;
            }

            if (current > highest) highest = current;
        }

        if (sequence > longest) {
            longest = sequence;
            identifier = start;
            printf("thread %lu, number %llu with %llu steps to one\n", step, identifier, longest);
        }

        start += (unsigned long long)THREAD_COUNT;
    }
}

Still, the loop seems to be finished. From what I understand, this should be fairly straight forward, still I'm left clueless as to what I'm doing wrong here.


Solution

  • What you're calling step is the index of the loop. It goes from 0 to THREAD_COUNT-1 in your code. Since you assign start to be step, that means your first iteration tries to compute the Colatz sequence starting at zero. That computes 0/2 == 0 and so is an infinite loop.

    What you meant to write is:

    unsigned long long start = step + 1;
    

    Calling your block size "THREAD_COUNT" is misleading. The question is not how many threads are created (no threads may be created; that's up to the system). The question is how many chunks to divide the work into.

    Note that reading and writing to longest and highest on multiple threads without synchronization is undefined behavior, so this code may do surprising things (particularly when optimized). Don't assume it's limited to getting the wrong values in longest and highest. The optimizer is allowed to assume no other thread touches those values while it runs, and can rearrange code dramatically based on that. But that's not the cause of this particular issue.