Imagine calculating a three dimensional array like this:
for (int i = 0; i < I; i++)
{
for (int j = 0; j < J; j++)
{
for (int k = 0; k < K; k++)
{
array[k + j * K + i * K * J] = someValue(i, j, k);
}
}
}
But the k + j * K + i * K * J
part is kinda expensive. Is it possible to tell the compiler to convert the loops into something like this?
array[0] = someValue(0, 0, 0);
array[1] = someValue(0, 0, 1);
array[2] = someValue(0, 0, 2);
array[3] = someValue(0, 1, 0);
...
This would ofcourse make the binaries larger, but would also speed up performance, if this code is executed a lot. Is it possible to do this? Or would I have to generate the code myself and paste it into the source file?
I believe in your particular case, we can re-write the loop as:
auto* scan = array;
for (int i = 0; i < I; i++)
{
for (int j = 0; j < J; j++)
{
for (int k = 0; k < K; k++)
{
*scan++ = someValue(i, j, k);
}
}
}
Reason 1: integer multiplication is incredibly cheap. Calculating k + j * K + i * K * J
is cheaper than retrieving a value from the computer's RAM, and it'll be about as cheap as (if not cheaper than) retrieving it from the CPU's fastest cache.
Reason 2: Compilers are incredibly smart. They can recognize which values change and which values stay the same, and optimize common sub-expressions out of loops (so that they're not performing the same computation multiple times).
Reason 3: Compilers are capable of taking advantage of vectorization instructions. Depending on what someValue
does, it may be able to compute multiple values in parallel on the same core by taking advantage of this. This is true for either method of indexing into array
.
C++ code isn't strictly imperative. Compilers can and do make major and complex optimizations to make code more efficient, and code like the one in your example are easy for them to optimize.