c++multithreading c++11 dictionary stdthread

Parallel implemention of Lisp-style mapping of a function to a list in C++ fails without cout after use of thread

This code works only when any of the lines under /* debug messages */ are uncommented. Or if the list being mapped to is less than 30 elements.

func_map is a linear implementation of a Lisp-style mapping and can be assumed to work.

Use of it would be as follows func_map(FUNC_PTR foo, std::vector* list, locs* start_and_end)

FUNC_PTR is a pointer to a function that returns void and takes in an int pointer

For example: &foo in which foo is defined as:

void foo (int* num){ (*num) = (*num) * (*num);}

locs is a struct with two members int_start and int_end; I use it to tell func_map which elements it should iterate over.

void par_map(FUNC_PTR func_transform, std::vector<int>* vector_array) //function for mapping a function to a list alla lisp
{
    int array_size = (*vector_array).size(); //retain the number of elements in our vector
    int num_threads = std::thread::hardware_concurrency(); //figure out number of cores    
    int array_sub = array_size/num_threads; //number that we use to figure out how many elements should be assigned per thread

    std::vector<std::thread> threads; //the vector that we will initialize threads in
    std::vector<locs> vector_locs; // the vector that we will store the start and end position for each thread

    for(int i = 0; i < num_threads && i < array_size; i++)
    {
        locs cur_loc; //the locs struct that we will create using the power of LOGIC
        if(array_sub == 0) //the LOGIC
        {
            cur_loc.int_start = i; //if the number of elements in the array is less than the number of cores just assign one core to each element
        }
        else
        {
            cur_loc.int_start = (i * array_sub); //otherwise figure out the starting point given the number of cores
        }

        if(i == (num_threads - 1))
        {
            cur_loc.int_end = array_size; //make sure all elements will be iterated over
        }
        else if(array_sub == 0)
        {
            cur_loc.int_end = (i + 1); //ditto
        }
            else
            {
                cur_loc.int_end = ((i+1) * array_sub); //otherwise use the number of threads to determine our ending point
            }

        vector_locs.push_back(cur_loc); //store the created locs struct so it doesnt get changed during reference
        threads.push_back(std::thread(func_map,
                                  func_transform,
                                   vector_array,
                                    (&vector_locs[i]))); //create a thread

        /*debug messages*/ // <--- whenever any of these are uncommented the code works
        //cout << "i = " << i << endl;
        //cout << "int_start == " << cur_loc.int_start << endl;
        //cout << "int_end == " << cur_loc.int_end << endl << endl;
        //cout << "Thread " << i << " initialized" << endl; 

    }

    for(int i = 0; i < num_threads && i < array_size; i++)
    {
        (threads[i]).join(); //make sure all the threads are done
    }
}

I think that the issue might be in how vector_locs[i] is used and how threads are resolved. But the use of a vector to maintain the state of the locs instance referenced by thread should prevent that from being an issue; I'm really stumped.

Solution

You're giving the thread function a pointer, &vector_locs[i], that may become invalidated as you push_back more items into the vector.
Since you know beforehand how many items vector_locs will contain - min(num_threads, array_size) - you can reserve that space in advance to prevent reallocation.

As to why it doesn't crash if you uncomment the output, I would guess that the output is so slow that the thread you just started will finish before the output is done, so the next iteration can't affect it.