Search code examples
coptimizationmemoryperformance32-bit

fast way to check if an array of chars is zero


I have an array of bytes, in memory. What's the fastest way to see if all the bytes in the array are zero?


Solution

  • Nowadays, short of using SIMD extensions (such as SSE on x86 processors), you might as well iterate over the array and compare each value to 0.

    In the distant past, performing a comparison and conditional branch for each element in the array (in addition to the loop branch itself) would have been deemed expensive and, depending on how often (or early) you could expect a non-zero element to appear in the array, you might have elected to completely do without conditionals inside the loop, using solely bitwise-or to detect any set bits and deferring the actual check until after the loop completes:

    int sum = 0;
    for (i = 0; i < ARRAY_SIZE; ++i) {
      sum |= array[i];
    }
    if (sum != 0) {
      printf("At least one array element is non-zero\n");
    }
    

    However, with today's pipelined super-scalar processor designs complete with branch prediction, all non-SSE approaches are virtualy indistinguishable within a loop. If anything, comparing each element to zero and breaking out of the loop early (as soon as the first non-zero element is encountered) could be, in the long run, more efficient than the sum |= array[i] approach (which always traverses the entire array) unless, that is, you expect your array to be almost always made up exclusively of zeroes (in which case making the sum |= array[i] approach truly branchless by using GCC's -funroll-loops could give you the better numbers -- see the numbers below for an Athlon processor, results may vary with processor model and manufacturer.)

    #include <stdio.h>
    
    int a[1024*1024];
    
    /* Methods 1 & 2 are equivalent on x86 */  
    
    int main() {
      int i, j, n;
    
    # if defined METHOD3
      int x;
    # endif
    
      for (i = 0; i < 100; ++i) {
    #   if defined METHOD3
        x = 0;
    #   endif
        for (j = 0, n = 0; j < sizeof(a)/sizeof(a[0]); ++j) {
    #     if defined METHOD1
          if (a[j] != 0) { n = 1; }
    #     elif defined METHOD2
          n |= (a[j] != 0);
    #     elif defined METHOD3
          x |= a[j];
    #     endif
        }
    #   if defined METHOD3
        n = (x != 0);
    #   endif
    
        printf("%d\n", n);
      }
    }
    
    $ uname -mp
    i686 athlon
    $ gcc -g -O3 -DMETHOD1 test.c
    $ time ./a.out
    real    0m0.376s
    user    0m0.373s
    sys     0m0.003s
    $ gcc -g -O3 -DMETHOD2 test.c
    $ time ./a.out
    real    0m0.377s
    user    0m0.372s
    sys     0m0.003s
    $ gcc -g -O3 -DMETHOD3 test.c
    $ time ./a.out
    real    0m0.376s
    user    0m0.373s
    sys     0m0.003s
    
    $ gcc -g -O3 -DMETHOD1 -funroll-loops test.c
    $ time ./a.out
    real    0m0.351s
    user    0m0.348s
    sys     0m0.003s
    $ gcc -g -O3 -DMETHOD2 -funroll-loops test.c
    $ time ./a.out
    real    0m0.343s
    user    0m0.340s
    sys     0m0.003s
    $ gcc -g -O3 -DMETHOD3 -funroll-loops test.c
    $ time ./a.out
    real    0m0.209s
    user    0m0.206s
    sys     0m0.003s