The central function in my code looks like this (everything else is vanilla input and output):
const int n = 40000;
double * foo (double const * const x)
{
double * y = malloc (n*sizeof(double));
y[0] = x[0] + (0.2*x[1]*x[0] - x[2]*x[2]);
y[1] = x[1] + (0.2*x[1]*x[0] - x[2]*x[2]);
// …
// 39997 lines of similar code
// that cannot be simplified to fewer lines
// …
y[40000] = 0.5*x[40000] - x[12345] + 5*x[0];
return y;
}
Assume for the purpose of this question that hard-coding these 40000 lines like this (or very similar) is really necessary. All these lines only contain basic arithmetic operations with fixed numbers and entries of x
(forty per line on average); no functions are called. The total size of the source is 14 MB.
When trying to compile this code I face an extensive memory usage by the compiler. I could get Clang to compile it with -O0
(which takes only 20 s), but I failed with the GCC (even with -O0
) or with -O1
.
While there is little that can be optimised on the code side or on a global scale (i.e., by computing the individual lines in another order), I am confident that a compiler will find some things to optimise on a local scale (e.g., calculating the bracketed term needed to calculate y[0]
and y[1]
).
My questions are thus:
The following comment by Lee Daniel Crocker solved the problem:
I suspect the limit you're running into is the size of the structures needed for a single stack frame/block/function. Try breaking it up into, say, 100 functions of 400 lines each and see if that does better.
When using functions of 100 lines each (and calling all of them in a row), I obtained a program that I could compile with -O2
without any problem.