c++pthreads posix load-balancing thread-synchronization

How to achieve load balancing to threads?

I have a problem on how to load balancing between threads. Each thread should work on a specific rows from N*N matrix, for example if N = 4 and Num_Threads = 2. Note that this case only for (N%Num_Threads) == 0.

for (int i = 0 ; i < num_threads; i++){
    if ( i == 0) { RANGE -> R1 = 0; RANGE -> R2 = n/num_threads; }
    else {
        RANGE -> R1 = RANGE -> R2  ; // I have defined a struct previously contains R1,R2 which means Row1,Row2
        RANGE -> R2 = RANGE -> R1 + n/num_threads ; }
        cout << "ThreadID= " << i << ", startRow= " << RANGE -> R1 << ", endRow= " << RANGE -> R2 << endl ;   
        pthread_create(&threads[i],NULL,Median,RANGE);
    }
}

Output:

ThreadID= 0 start_row = 0  , end_row = 
ThreadID= 1 ..... = 2 , .....=4

**But how can I do load balancing if (N%Num_Threads) != 0 & Num_Threads < N. For Example if N = 5 and Num_Threads = 3. **

I have tried to give each Thread (N/Num_threads) row but how should I divide the remaining (N%Numthreads) rows over the threads ?

For example if N = 5 and Num_threads = 3 then there will be 2 rows left how can I distribute them over the 3 threads ? and if it was a big numbers like N = 101 and Num_Threads=7 it will be harder to do I have no idea on how to achieve this.

NOTE: not all threads should take the same number of rows but the code should TRY to achieve as much load balancing as possible between threads.

Solution

You already have and are using an object to convey to each thread the range of rows it whould operate upon, so I think all you're asking for is how to compute those ranges such that their sizes do not differ from each other by more than one item.

Suppose you have N items to distribute across T threads. All threads will get at least N / T items, N % T threads will each get one more. There are lots of ways to write code for that, but this one is pretty clear and simple:

size_t next_range_start = 0;

for (int i = 0 ; i < num_threads; i++) {
    size_t next_range_end_exclusive = next_range_start + n / num_threads;

    if (i < (n % num_threads)) {
        next_range_end_exclusive++;
    }

    cout << "ThreadID = " << i
         << ", startRow = " << next_range_start
         << ", endRow (exclusive) = " << next_range_end_exclusive
         << endl ;
    pthread_create(&threads[i], NULL, Median,
            new range_type(next_range_start, next_range_end_exclusive));
    next_range_start = next_range_end_exclusive;
}

Note well that to use the same range range object for different threads, you would need sufficient synchronization and signaling in place to ensure that each worker thread reads the data it needs before the main thread updates the object with values for the next worker thread. Using a separate range object for each thread, as shown, is a lot easier. As presented, however, it does put the responsibility on each worker thread to free the range object it receives.

Note also that the above assumes that the end index of the range objects is exclusive, not inclusive. Your original code is suggestive of that, but a bit ambiguous. An inclusive end index could also be used, with some adjustments.