Search code examples
c++optimizationarmneon

Bilinear Interpolation from C to Neon


I'm trying to downsample an Image using Neon. So I tried to exercise neon by writing a function that subtracts two images using neon and I have succeeded. Now I came back to write the bilinear interpolation using neon intrinsics. Right now I have two problems, getting 4 pixels from one row and one column and also compute the interpolated value (gray) from 4 pixels or if it is possible from 8 pixels from one row and one column. I tried to think about it, but I think the algorithm should be rewritten at all ?

void resizeBilinearNeon( uint8_t *src, uint8_t *dest,  float srcWidth,  float srcHeight,  float destWidth,  float destHeight)
{

    int A, B, C, D, x, y, index;

       float x_ratio = ((float)(srcWidth-1))/destWidth ;
       float y_ratio = ((float)(srcHeight-1))/destHeight ;
       float x_diff, y_diff;

       for (int i=0;i<destHeight;i++) {
          for (int j=0;j<destWidth;j++) {
               x = (int)(x_ratio * j) ;
               y = (int)(y_ratio * i) ;
               x_diff = (x_ratio * j) - x ;
               y_diff = (y_ratio * i) - y ;
               index = y*srcWidth+x ;

               uint8x8_t pixels_r = vld1_u8 (src[index]);
               uint8x8_t pixels_c = vld1_u8 (src[index+srcWidth]);

               // Y = A(1-w)(1-h) + B(w)(1-h) + C(h)(1-w) + Dwh
               gray = (int)(
                           pixels_r[0]*(1-x_diff)*(1-y_diff) +  pixels_r[1]*(x_diff)*(1-y_diff) +
                           pixels_c[0]*(y_diff)*(1-x_diff)   +  pixels_c[1]*(x_diff*y_diff)
                           ) ;

               dest[i*w2 + j] = gray ;
           }
  }  

Solution

  • @MarkRansom is not correct about nearest neighbor versus 2x2 bilinear interpolation; bilinear using 4 pixels will produce better output than nearest neighbor. He is correct that averaging the appropriate number of pixels (more than 4 if scaling by > 2:1) will produce better output still. However, NEON will not help with image downsampling unless the scaling is done by an integer ratio.

    The maximum benefit of NEON and other SIMD instruction sets is to be able to process 8 or 16 pixels at once using the same operations. By accessing individual elements the way you are, you lose all the SIMD benefit. Another problem is that moving data from NEON to ARM registers is a slow operation. Downsampling images is best done by a GPU or optimized ARM instructions.