Search code examples
c++if-statementsseintrinsicsmmx

if/else statement in SSE intrinsics


I am trying to optimize a small piece of code with SSE intrinsics (I am a complete beginner on the topic), but I am a little stuck on the use of conditionals.

My original code is:

unsigned long c;
unsigned long constant = 0x12345678;
unsigned long table[256];
int n, k;

for( n = 0; n < 256; n++ )
{
  c = n;
  for( k = 0; k < 8; k++ )
    {
      if( c & 1 ) c = constant ^ (c >> 1);
      else c >>= 1;
    }
  table[n] = c;
}

The goal of this code is to compute a crc table (the constant can be any polynomial, it doesn't play a role here),

I suppose my optimized code would be something like:

__m128 x;
__m128 y;
__m128 *table;

x = _mm_set_ps(3, 2, 1, 0);
y = _mm_set_ps(3, 2, 1, 0);
//offset for incrementation
offset = _mm_set1_ps(4);

for( n = 0; n < 64; n++ )
{
    y = x;
    for( k = 0; k < 8; k++ )
    {
        //if do something with y
        //else do something with y
    }
    table[n] = y;
    x = _mm_add_epi32 (x, offset);
}

I have no idea how to go through the if-else statement, but I suspect there is a clever trick. Has anybody an idea on how to do that?

(Aside from this, my optimization is probably quite poor - any advice or correction on it would be treated with the greatest sympathy)


Solution

  • You can get rid of the if/else entirely. Back in the days when I produced MMX assembly code, that was a common programming activity. Let me start with a series of transformations on the "false" statement:

    c >>= 1;
    
    c = c >> 1;
    
    c = 0 ^ (c >> 1);
    

    Why did I introduce the exclusive-or? Because exclusive-or is also found in the "true" statement:

    c = constant ^ (c >> 1);
    

    Note the similarity? In the "true" part, we xor with a constant, and in the false part, we xor with zero.

    Now I'm going to show you a series of transformations on the entire if/else statement:

    if (c & 1)
        c = constant ^ (c >> 1);          // same as before
    else
        c =        0 ^ (c >> 1);          // just different layout
    
    if (c & 1)
        c =  constant      ^ (c >> 1);
    else
        c = (constant & 0) ^ (c >> 1);    // 0 == x & 0
    
    if (c & 1)
        c = (constant & -1) ^ (c >> 1);   // x == x & -1
    else
        c = (constant &  0) ^ (c >> 1);
    

    Now the two branches only differ in the second argument to the binary-and, which can be calculated trivially from the condition itself, thus enabling us to get rid of the if/else:

    c = (constant & -(c & 1)) ^ (c >> 1);
    

    Disclaimer: This solution only works on a two's complement architecture where -1 means "all bits set".