Search code examples
c++performanceimage-processinginterpolationbicubic

How can I best improve the execution time of a bicubic interpolation algorithm?


I'm developing some image processing software in C++ on Intel which has to run a bicubic interpolation algorithm on small (about 1kpx) images over and over again. This takes a lot of time, and I'm aiming to speed it up. What I have now is a basic implementation based on the literature, a somewhat-improved (with regard to speed) version which doesn't do matrix multiplication, but rather uses pre-calculated formulas for parts of the interpolating polynomial and last, a fixed-point version of the matrix-multiplying code (works slower actually). I also have an external library with an optimized implementation, but it's still too slow for my needs. What I was considering next is:

  • vectorization using MMX/SSE stream processing, on both the floating and fixed-point versions
  • doing the interpolation in the Fourier domain using convolution
  • shifting the work onto a GPU using OpenCL or similar

Which of these approaches could yield greatest performance gains? Could you suggest another? Thanks.


Solution

  • I think GPU is the way to go. It's probably the most natural task for this type of hardware. I would start by looking into CUDA or OpenCL. Older techniques like simple DirectX/OpenGL pixel/fragment shaders should work just fine as well.

    Some links I found, maybe they could help you: