I have the following loop that I am running on an ARM processor.
// pin here is pointer to some part of an array
for (i = 0; i < v->numelements; i++)
{
pe = pptr[i];
peParent = pe->parent;
SPHERE *ps = (SPHERE *)(pe->data);
pin[0] = FLOAT2FIX(ps->rad2);
pin[1] = *peParent->procs->pe_intersect == &SphPeIntersect;
fixifyVector( &pin[2], ps->center ); // Is an inline function
pin = pin + 5;
}
By the slow performance of the loop, I can judge that the compiler was unable to unroll this loop, as when I manually do the unrolling, it becomes quite fast. I think the compiler is getting confused by the pin
pointer. Can we use restrict
keyword to help the compiler here, or is restrict
only reserved for function parameters? In general how can we tell the compiler to unroll it and don't worry about the pin
pointer.
To tell gcc to unroll all loops you can use the optimization flag -funroll-loops
.
To unroll only a specific loop you can use:
__attribute__((optimize("unroll-loops")))
see this answer for more details.
Edit
If the compiler cannot determine the number of iterations of the loop upon entry you will need to use -funroll-all-loops
. Note that from the documentation: "Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly."