optimization performance dynamic-memory-allocation

Allocating memory inside loop vs outside loop

Is there noticeable performance penalty for allocating LARGE chunks of heap memory in every iteration of loop? Of course I free it at the end of each iteration.

An alternative would be to allocate once before entering the loop, repeatedly using it in all iterations, and eventually freeing it up after exiting the loop. See the code below.

// allocation inside loop
for(int i = 0; i < iter_count; i++) {
    float *array = new float[size]();
    do_something(array);
    delete []array;
}

// allocation outside loop
float *array = new float[size]();
for(int i = 0; i < iter_count; i++) {
    do_something(array);
}
delete []array;

Solution

Even if allocation was constant time, you have TxN instead of T. In addiiton, if you have any memory initialization of the chunk (even if it's just setting to zero), you repeatedly thrash your cache.
The major performance hit of heap allocations is fragmentation, not allocation time, which is an accumulative problem. Accumulate less.
There are some pathological cases. If there's a lot of short-lived allocation activity that "spans" deallocation and allocation of the chunk (e.g. running the same routine in another thread), you might frequently push the heap manager to require new memory for the big chunk (because it's currently occupied). That will really fragment your cache and increase your working set.

So there's the direct hit, which can be measured directly: how much does new/delete cost compared to do_something()? if do_something is expensive, you might not measure much.

And there's the "heap pressure" which accumulates in a large application. The contribution to that is hard to measure, and you might hit a performance brick wall built by a dozen independent contributors, which are hard to identify after the fact.