Time taken increasing exponentially when comparing elements in vectors

I have a piece of code that looks like this:

void MonteCarloRainbowOptionFunction::ValueInstrument()
{
    std::vector<MJArray> correlatedNormVariates = GetArraysOfCorrelatedGauassiansByBoxMuller(numberOfPaths, covMatrix); 
    std::vector<StatisticAllPaths> thesePathGatherers;
    for (unsigned long i = 0; i < S_vect.size(); i++)
    {
        StandardExcerciseOption thisOption(ThePayOffVect[i], TTM);
        StatisticAllPaths onePathGatherer;
        thesePathGatherers.push_back(onePathGatherer);
        OneStepMonteCarloValuation(thisOption, S_vect[i], impvol_vect[i], r, d_vect[i], numberOfPaths, correlatedNormVariates[i], thesePathGatherers[i]);
    }
    f = 0;
    for (unsigned long i = 0; i < numberOfPaths; i++)
    {       
        int count = 0;
        if (i % 2500 == 0)
        {
            count += 1;
            std::cout << i << " paths done" << "\n";
        }
        std::vector<double> outcomes;
        for (unsigned long j = 0; j < S_vect.size(); j++)
        {
            outcomes.push_back(thesePathGatherers[j].GetResultsSoFar()[0][i]);
        }
        switch (optionType)
        {
        case RainbowOptionType::best_of:
            f += *max_element(outcomes.begin(), outcomes.end());
            break;
        case RainbowOptionType::worst_of:
            f += *min_element(outcomes.begin(), outcomes.end());
            break;          
        default:
            throw std::runtime_error("Unsupported or unknown option type found in " + GetuniqueIdentifier()[0]);
            break;
        }   
    }
    f *= ((double)nominal/numberOfPaths);
    return;
}

In the first part (before the first "Time Check") I simulate a large amount of values which gets saved in the vectors (really in the first vector of a vector of vectors) thesePathGatherers through the function OneStepMonteCarloValuation. Like so (the time taken to do this is minimal):

void OneStepMonteCarloValuation(const StandardExcerciseOption& TheOption, double Spot, double Vol, double r, double d, unsigned long NumberOfPaths, MJArray normVariates, StatisticsMC& gatherer)
{
    if (normVariates.size() != NumberOfPaths)
        throw("mismatched number of paths and normal variates");
    //Pre-calculate as much as possible
    double Expiry = TheOption.GetExpiry();
    double variance = Vol * Vol * Expiry;
    double rootVariance = sqrt(variance);
    double itoCorrection = -0.5 * variance;
    double movedSpot = Spot * exp((r-d) * Expiry + itoCorrection);
    double thisSpot;
    double discounting = exp(-r * Expiry);
    for (unsigned long i = 0; i < NumberOfPaths; i++)
    {
        thisSpot = movedSpot * exp(rootVariance * normVariates[i]);
        double thisPayoff = TheOption.OptionPayOff(thisSpot);
        gatherer.DumpOneResult(discounting * thisPayoff);
    }
    return;
}

And the gatherer it calls here is this class:

StatisticAllPaths::StatisticAllPaths(const unsigned long minimumNumberOfPaths) : PathsDone(0)
{
    ResultList.reserve(minimumNumberOfPaths);
}

void StatisticAllPaths::DumpOneResult(double result)
{
    ResultList.push_back(result);
    PathsDone++;
}

std::vector<std::vector<double>> StatisticAllPaths::GetResultsSoFar() const
{
    vector<double> innerVector(ResultList);
    vector<vector<double> > Results(1, innerVector);
    Results[0] = ResultList;
    return Results;
}

In the second part I go through all the vectors and save the i element in the temp "output" vector, compare the values in output and save the result and then do the same. Looping through all the values in the thesePathGatherers. The first part runs fast but this one takes much longer and even fails to complete all together for high enough numberOfPaths. I added this part:

    if (i % 2500 == 0)
    {
        count += 1;
        std::cout << i << " paths done" << "\n";
    }

To debug and find that doing for example that doing the first 10^4 paths takes much longer if the total amount of paths is big, i.e. each individual path takes longer the more paths you do. Is there something wrong with the GetResultsSoFar() implementation/usage? That's the only thing I can imagine. Any help appreciated

Let me know if there is more I need to share.

Solution

There are answers here focusing on the excessive copying that happens in GetResultsSoFar, but none of them address the pointlessness of how you are calling this function:

std::vector<double> outcomes;
for (unsigned long j = 0; j < S_vect.size(); j++)
{
    outcomes.push_back(thesePathGatherers[j].GetResultsSoFar()[0][i]);
}

Here, you are gathering a single result value for every "path gatherer", but to do that you are copying each result list multiple times before extracting one value. I think the first step of improving this is to avoid any copying of the ResultList because it's not needed.

Instead, how about returning a const reference to the results. I'm going to also rename this function to avoid confusing it with your existing definition. But call it whatever you like:

const std::vector<double>& StatisticAllPaths::GetResultList() const
{
    return ResultList;
}

Of course, the loop would then access the ith value as thesePathGatherers[j].GetResultList()[i]. The critical difference here is no copying happened at all.

There's also a small optimization to do on outcomes. Since you know exactly how many items you are going to push, you should reserve that memory up-front.

i.e.

std::vector<double> outcomes;
outcomes.reserve(S_vect.size());
for (unsigned long j = 0; j < S_vect.size(); j++)
{
    outcomes.push_back(thesePathGatherers[j].GetResultList()[i]);
}

If you want it to be even tidier, consider defining double GetResult(size_t) const that just returns a single result, and then calling thesePathGatherers[j].GetResult(i).