Search code examples
c++multithreadingopenmpbarrier

Parallel threads doing the same until a barrier, after barrier the result is not always the same


Several threads increment a value in an array in parallel. Every thread increments its own value, not changing the one of the others. After each incrementation we have a barrier so that threads wait for each other and continue to the next loop iteration only when all are done. This way, I suppose, every time all the threads reach the barrier, the values in every array element are equal. However, it is not always the case. Why is this not always the case?

The code:

#include <iostream>
#include <omp.h>
#include <vector>
#include <chrono>

int main() {
    const int num_threads = 4;
    const auto start_time = std::chrono::high_resolution_clock::now();
    std::vector<int> arr(num_threads, 0);

    #pragma omp parallel num_threads(num_threads)
    {
        int thread_num = omp_get_thread_num();
        while (true) {
            // Check elapsed time
            auto now = std::chrono::high_resolution_clock::now();
            auto elapsed = std::chrono::duration_cast<std::chrono::seconds>(now - start_time).count();
            if (elapsed >= 10) break;

            
            arr[thread_num]++;
            std::cout << "Thread " << thread_num << " done grinding.\n";
            
            #pragma omp barrier // Wait for other threads

            // Read other threads' values
            for (int i = 0; i < num_threads; ++i) {
                if (i != thread_num && arr[i] != arr[thread_num]) {
                    std::cout << "Thread " << thread_num << ": My value is " << arr[thread_num] 
                              << ", but thread " << i << "'s value is " << arr[i] << ".\n";
                }
            }
        }
    }

    std::cout << "Final array values:\n";
    for (int i = 0; i < num_threads; ++i) {
        std::cout << "Index " << i << ": " << arr[i] << "\n";
    }

    return 0;
}

The output (the end of it):

Thread 3 done grinding.
Thread 2 done grinding.
Thread 1: My value is 271402, but thread 2's value is 271403.
Thread 1 done grinding.
Thread 3: My value is 271402, but thread 1's value is 271403.
Thread 3: My value is 271402, but thread 2's value is 271403.
Thread 0: My value is 271402, but thread 2's value is 271403.
Thread 1: My value is 271403, but thread 0's value is 271402.
Thread 1: My value is 271403, but thread 3's value is 271402.
Thread 2: My value is 271403, but thread 0's value is 271402.
Thread 2: My value is 271403, but thread 3's value is 271402.
Final array values:
Index 0: 271402
Index 1: 271403
Index 2: 271403
Index 3: 271402

Solution

  • The code as written hangs nearly every time I run it, never finishing, because one of the threads gets stuck at the barrier waiting for the others that already terminated. This happens because some thread checks elapsed at 9.99999 seconds and proceeds but another thread checks at 10 and exits. This is also why the finish count can be higher on some threads than on others because they go an extra round, also amid the run the increment values are sometimes not equal because some threads spend longer in the for loop printing out their current value than others who have already proceeded to wrap around to the next barrier to wait.

    In order to terminate the threads appropriately and to synchronize them perfectly a flag is needed which will be #pragma omp flushed in order to break the threads at the same time, also another barrier after the increment comparison and print for loop is needed:

    #include <iostream>
    #include <omp.h>
    #include <vector>
    #include <chrono>
    #include <sstream>
    
    
    int main()
    {
    
        const auto start_time = std::chrono::high_resolution_clock::now();
        const int num_threads = 4;
        std::vector<int> arr(num_threads, 0);
        bool breakFlag = false;
    
    #pragma omp parallel num_threads(num_threads) default(none) shared(start_time,arr,std::cout, breakFlag)
        {
    
            int thread_num = omp_get_thread_num();
            while (true) {
                // Check elapsed time
                auto now = std::chrono::high_resolution_clock::now();
                auto elapsed = std::chrono::duration_cast<std::chrono::seconds>(now - start_time).count();
                if (elapsed >= 10) breakFlag = true;
    
    
                arr[thread_num]++;
                std::stringstream ss;
                ss << "Thread " << thread_num << " done grinding.\n";
                std::cout << ss.str();
    
    #pragma omp barrier // Wait for other threads
    #pragma omp flush(breakFlag)
                if(breakFlag) break;
    
                // Read other threads' values
                for (int i = 0; i < num_threads; ++i) {
                    if (i != thread_num && arr[i] != arr[thread_num]) {
                        std::stringstream _ss;
                        _ss << "Thread " << thread_num << ": My value is " << arr[thread_num]
                                  << ", but thread " << i << "'s value is " << arr[i] << ".\n";
                        std::cout << _ss.str();
                    }
                }
    #pragma omp barrier // Wait for other threads
    
            }
        }
    
        std::cout << "Final array values:\n";
        for (int i = 0; i < num_threads; ++i) {
            std::cout << "Index " << i << ": " << arr[i] << "\n";
        }
    
        return 0;
    }