Search code examples
c++cneoncortex-a8arm7

Efficient floating point comparison (Cortex-A8)


There is a big (~100 000) array of floating point variables, and there is a threshold (also floating point).

The problem is that I have to compare each one variable from the array with a threshold, but NEON flags transfer takes a really long time (~20 cycles in accordance to a profiler).

Is there any efficient way to compare these values?

NOTE: As rounding error doesn't matter, I tried the following:

float arr[10000];
float threshold; 
....

int a = arr[20]; // e.g.
int t = threshold;
if (t > a) {....}

But in this case I getting the following processor command sequence:

vldr.32        s0, [r0]
vcvt.s32.f32   s0, s0
vmov           r0, s0    <--- takes 20 cycles as `vmrs APSR_nzcv, fpscr` in case of 
cmp            r0, r1         floating point comparison

As conversion happens at NEON, there is no matter if I compare integers, by described way or floats.


Solution

  • If floats are 32-bit IEEE-754 and ints are 32-bit too and if there are no +infinity, -infinity and NaN values, we can compare floats as ints with a little trick:

    #include <stdio.h>
    #include <limits.h>
    #include <assert.h>
    
    #define C_ASSERT(expr) extern char CAssertExtern[(expr)?1:-1]
    C_ASSERT(sizeof(int) == sizeof(float));
    C_ASSERT(sizeof(int) * CHAR_BIT == 32);
    
    int isGreater(float* f1, float* f2)
    {
      int i1, i2, t1, t2;
    
      i1 = *(int*)f1;
      i2 = *(int*)f2;
    
      t1 = i1 >> 31;
      i1 = (i1 ^ t1) + (t1 & 0x80000001);
    
      t2 = i2 >> 31;
      i2 = (i2 ^ t2) + (t2 & 0x80000001);
    
      return i1 > i2;
    }
    
    int main(void)
    {
      float arr[9] = { -3, -2, -1.5, -1, 0, 1, 1.5, 2, 3 };
      float thr;
      int i;
    
      // Make sure floats are 32-bit IEE754 and
      // reinterpreted as integers as we want/expect
      {
        static const float testf = 8873283.0f;
        unsigned testi = *(unsigned*)&testf;
        assert(testi == 0x4B076543);
      }
    
      thr = -1.5;
      for (i = 0; i < 9; i++)
      {
        printf("%f %s %f\n", arr[i], "<=\0> " + 3*isGreater(&arr[i], &thr), thr);
      }
    
      thr = 1.5;
      for (i = 0; i < 9; i++)
      {
        printf("%f %s %f\n", arr[i], "<=\0> " + 3*isGreater(&arr[i], &thr), thr);
      }
    
      return 0;
    }
    

    Output:

    -3.000000 <= -1.500000
    -2.000000 <= -1.500000
    -1.500000 <= -1.500000
    -1.000000 >  -1.500000
    0.000000 >  -1.500000
    1.000000 >  -1.500000
    1.500000 >  -1.500000
    2.000000 >  -1.500000
    3.000000 >  -1.500000
    -3.000000 <= 1.500000
    -2.000000 <= 1.500000
    -1.500000 <= 1.500000
    -1.000000 <= 1.500000
    0.000000 <= 1.500000
    1.000000 <= 1.500000
    1.500000 <= 1.500000
    2.000000 >  1.500000
    3.000000 >  1.500000
    

    Of course, it makes sense to precalculate that final integer value in isGreater() that's used in the comparison operator if your threshold doesn't change.

    If you are afraid of undefined behavior in C/C++ in the above code, you can rewrite the code in assembly.