Search code examples
cgccarmloop-unrolling

How to tell the compiler to unroll this loop


I have the following loop that I am running on an ARM processor.

// pin here is pointer to some part of an array
for (i = 0; i < v->numelements; i++)
{
    pe   = pptr[i];
    peParent = pe->parent;

    SPHERE  *ps = (SPHERE *)(pe->data);

    pin[0] = FLOAT2FIX(ps->rad2);
    pin[1] = *peParent->procs->pe_intersect == &SphPeIntersect;
    fixifyVector( &pin[2], ps->center ); // Is an inline function

    pin = pin + 5;
}

By the slow performance of the loop, I can judge that the compiler was unable to unroll this loop, as when I manually do the unrolling, it becomes quite fast. I think the compiler is getting confused by the pin pointer. Can we use restrict keyword to help the compiler here, or is restrict only reserved for function parameters? In general how can we tell the compiler to unroll it and don't worry about the pin pointer.


Solution

  • To tell gcc to unroll all loops you can use the optimization flag -funroll-loops.

    To unroll only a specific loop you can use:

    __attribute__((optimize("unroll-loops")))
    

    see this answer for more details.

    Edit

    If the compiler cannot determine the number of iterations of the loop upon entry you will need to use -funroll-all-loops. Note that from the documentation: "Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly."