Search code examples
clibjpeglibjpeg-turbo

Is it better to use jpeg_write_scanlines with multiple scanlines at once?


Using libjpeg (or libjpeg-turbo) to do JPEG encoding, I was wondering if there is any improvements providing multiple scanlines at once to the jpeg_write_scanlines function. I did some tests on 720x288 images, but I only get 0,5% increase when processing the whole image at once.

I guess this increase is just due to the removal of call stack overhead, but I was expecting a bit more, at least with libjpeg-turbo.

The performance test was run with Callgrind (in Valgrind), so maybe I'm missing something. Or I really misunderstood how JPEG encoder works.


Solution

  • JPEG has a minimum height of a row, called MCU height. It is 8 lines in images without subsampling (4:4:4 mode) or 16 lines if chroma is subsampled (4:2:0 mode).

    If you feed libjpeg these 8 or 16 lines it will be able to process the whole row in one go. Otherwise it'll need to do extra bookkeeping or buffering.

    Writing multiple MCU heights at a time, or the whole image, won't hurt.