c++multithreading c++11 matrix stdthread

How can I raise a matrix to a power with multiple threads?

I am trying to raise a matrix to a power with multiple threads, but I am not very good with threads. Also I enter the number of threads from keyboard and that number is in range [1, matrix height], then I do the following:

unsigned period = ceil((double)A.getHeight() / threadNum);
unsigned prev = 0, next = period;
for (unsigned i(0); i < threadNum; ++i) {
        threads.emplace_back(&power<long long>, std::ref(result), std::ref(A), std::ref(B), prev, next, p);

        if (next + period > A.getHeight()) {
            prev = next;
            next = A.getHeight();
        }
        else {
            prev = next;
            next += period;
        }
    }

It was easy for me to multiply one matrix by another with multiple threads, but here the problem is that once 1 step is done, for example I need to raise A to the power of 3, A^2 would be that one step, after that step I have to wait for all the threads to finish up, before moving on to doing A^2*A. How can I make my threads wait for that? I'm using std::thread's.

After the first reply was posted I realized that I forgot to mention that I want to create those threads only once, and not recreate them for each multiplication step.

Solution

I would suggest using condition_variable.

Algorithm would be something like this:

Split the matrix in N parts for N threads.
Each thread calculates the necessary resulting sub matrix for a single multiplication.
Then it increments an atomic threads_finished counter using fetch_add and waits on a shared condition variable.
Last thread that finishes (fetch_add()+1 == thread count), notifies all threads, that they can now continue processing.
Profit.

Edit: Here is and example how to stop threads:

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <vector>
#include <algorithm>
#include <atomic>

void sync_threads(std::condition_variable & cv, std::mutex & mut, std::vector<int> & threads, const int idx) {
    std::unique_lock<std::mutex> lock(mut);
    threads[idx] = 1; 
    if(std::find(threads.begin(),threads.end(),0) == threads.end()) {
        for(auto & i: threads)
            i = 0;
        cv.notify_all();
    } else {
        while(threads[idx])
            cv.wait(lock);
    }
}

int main(){

    std::vector<std::thread> threads;

    std::mutex mut;
    std::condition_variable cv;

    int max_threads = 10;
    std::vector<int> thread_wait(max_threads,0);

    for(int i = 0; i < max_threads; i++) {
        threads.emplace_back([&,i](){
                std::cout << "Thread "+ std::to_string(i)+" started\n";
                sync_threads(cv,mut,thread_wait,i);
                std::cout << "Continuing thread " + std::to_string(i) + "\n";
                sync_threads(cv,mut,thread_wait,i);
                std::cout << "Continuing thread for second time " + std::to_string(i) + "\n";

            });
    }

    for(auto & i: threads)
        i.join();
}

The interesting part is here:

void sync_threads(std::condition_variable & cv, std::mutex & mut, std::vector<int> & threads, const int idx) {
    std::unique_lock<std::mutex> lock(mut); // Lock because we want to modify cv
    threads[idx] = 1; // Set my idx to 1, so we know we are sleeping
    if(std::find(threads.begin(),threads.end(),0) == threads.end()) {
        // I'm the last thread, wake up everyone
        for(auto & i: threads)
            i = 0;
        cv.notify_all();
    } else { //I'm not the last thread - sleep until all are finished
        while(threads[idx]) // In loop so, if we wake up unexpectedly, we go back to sleep. (Thanks for pointing that out Yakk)
            cv.wait(lock);
    }
}