Search code examples
openclopencl-c

Which is more efficient in OpenCL: if conditions or for loops?


I have a piece of OpenCL code like this

if (Sum[0] < Best)
{
    Best = Sum[0];
    iBest = 1;
    *aBits = Bits[0];
}

if (Sum[1] < Best)
{
    Best = Sum[1];
    iBest = 2;
    *aBits = Bits[1];
}

if (Sum[2] < Best)
{
    Best = Sum[2];
    iBest = 3;
    *aBits = Bits[2];
}

if (Sum[3] < Best)
{
    Best = Sum[3];
    iBest = 4;
    *aBits = Bits[3];
}

if (Sum[4] < Best)
{
    Best = Sum[4];
    iBest = 5;
    *aBits = Bits[4];
}

if (Sum[5] < Best)
{
    Best = Sum[5];
    iBest = 6;
    *aBits = Bits[5];
}

if (Sum[6] < Best)
{
    Best = Sum[6];
    iBest = 7;
    *aBits = Bits[6];
}

if (Sum[7] < Best)
{
    Best = Sum[7];
    iBest = 8;
    *aBits = Bits[7];
}

In order to reduce the logic, I rewrote the code like this

for(i = 1; i < 8; i++)
{
    if(Sum[i] < Sum[index])
        index = i;
}

if (Sum[index] < Best)
{
    Best = Sum[index];
    iBest = index + 1;
    *aBits = Bits[index];
}

But, in the second case the latency increased, by about 20%. Can anybody provide any insight into this kind of behavior? Is the if conditions style of coding more efficient than for loops in OpenCL?

I'm using Intel 530 (Gen9) GPU. I'm using memory mapped access.


Solution

  • The first case is bad for a GPU. Since it forces that when one of the work items enters an if condition all of them do. If you expect random entering the "if" conditions, in the end all instructions are executed and they are more than in the second case.

    While on the second case, the GPU instructions inside the "if" are less, only one liners. And all the work items enter the last section at the same time.

    For a CPU the first case is best, since there is no need to save an index and then look it up.

    In any case, avoid double/tripple reading variables on the global memory. Because those are not optimized by the compiler (unless marked as read_only). This code should be much faster to what you wrote:

    int best_sum = Sum[index]; //Private, fast access
    for(i = 1; i < 8; i++)
    {
        int sum = Sum[i]; //Again private
        if(sum < best_sum){
            index = i;
            best_sum = sum;
        }
    }
    
    if (best_sum < Best)
    {
        Best = best_sum;
        iBest = index + 1;
        *aBits = Bits[index];
    }