Search code examples
c++c++11c++14stdasync

How to use std::async efficiently to perform operations on pointer array


I am very new to modern C++ library, and trying to learn how to use std::async to perform some operations on a big pointer array. The sample code I have written is crashing at the point where the async task is launched.

Sample code:

#include <iostream>
#include <future>
#include <tuple>
#include <numeric>


#define maximum(a,b)            (((a) > (b)) ? (a) : (b))

class Foo {
    bool flag;

public:

    Foo(bool b) : flag(b) {}

    //******
    //
    //******
    std::tuple<long long, int> calc(int* a, int begIdx, int endIdx) {
        long sum = 0;
        int max = 0;

        if (!(*this).flag) {
            return std::make_tuple(sum, max);
        }

        if (endIdx - begIdx < 100)
        {
            for (int i = begIdx; i < endIdx; ++i)
            {
                sum += a[i];
                if (max < a[i])
                    max = a[i];
            }
            return std::make_tuple(sum, max);
        }

        int midIdx = endIdx / 2;
        auto handle = std::async(&Foo::calc, this, std::ref(a), midIdx, endIdx);
        auto resultTuple = calc(a, begIdx, midIdx);
        auto asyncTuple = handle.get();

        sum = std::get<0>(asyncTuple) +std::get<0>(resultTuple);
        max = maximum(std::get<1>(asyncTuple), std::get<1>(resultTuple));

        return std::make_tuple(sum, max);
    }

    //******
    //
    //******
    void call_calc(int*& a) {
        auto handle = std::async(&Foo::calc, this, std::ref(a), 0, 10000);
        auto resultTuple = handle.get();

        std::cout << "Sum = " << std::get<0>(resultTuple) << "  Maximum = " << std::get<1>(resultTuple) << std::endl;
    }
};

//******
//
//******
int main() {
    int* nums = new int[10000];
    for (int i = 0; i < 10000; ++i)
        nums[i] = rand() % 10000 + 1;

    Foo foo(true);
    foo.call_calc(nums);

    delete[] nums;
}

Can anyone help me to identify why does it crash? Is there any better approach to apply parallelism to operations on a big pointer array?


Solution

  • The fundamental problem is your code wants to launch more than array size / 100 threads. That means more than 100 threads. 100 threads won't do anything good; they'll thrash. See std::thread::hardware_concurrency, and in general don't use raw async or thread in production applications; write task pools and splice together futures and the like.

    That many threads is both extremely inefficient and could exhaust system resources.

    The second problem is you failed to calculate the average of 2 values.

    The average of begIdx and endIdx is not endIdx/2 but rather:

    int midIdx = begIdx + (endIdx-begIdx) / 2;
    

    Live example.

    You'll notice I discovered the problem with your program by adding intermediate output. In particular, I had it print out the ranges it was working on, and I noticed it was repeating ranges. This is known as "printf debugging", and is pretty powerful especially when step-based debugging isn't (with this many threads, stepping through the code will be brain-numbing)