Earlier I read about mozjpeg. A project from Mozilla to create a jpeg encoder that is more efficient, i.e. creates smaller files.
As I understand (jpeg) codecs, a jpeg encoder would need to create files that use an encoding scheme that can also be decoded by other jpeg codecs. So how is it possible to improve the codec without breaking compatibility with other codecs?
Mozilla does mention that the first step for their encoder is to add functionality that can detect the most efficient encoding scheme for a certain image, which would not break compatibility. However, they intend to add more functionality, first of which is "trellis quantization", which seems to be a highly technical algorithm to do something (I don't understand).
I'm also not entirely sure this quetion belongs on stack overflow, it might also fit superuser, since the question is not specifically about programming. So if anyone feels it should be on superuser, feel free to move this question
JPEG is somewhat unique in that it involves a series of compression steps. There are two that provide the most opportunities for reducing the size of the image.
The first is sampling. In JPEG one usually converts from RGB to YCbCR. In RGB, each component is equal in value. In YCbCr, the Y component is much more important than the Cb and Cr components. If you sample the later at 4 to 1, a 4x4 block of pixels gets reduced from 16+16+16 to 16+1+1. Just by sampling you have reduced the size of the data to be compressed by nearly 1/3.
The other is quantization. You take the sampled pixel values, divide them into 8x8 blocks and perform the Discrete Cosine transform on them. In 8bpp this takes 8x8 8-bit data and converts it to 8x8 16 bit data (inverse compression at that point).
The DCT process tends to produce larger values in the upper right corner and smaller values (close to zero) towards the lower left corner. The upper right coefficients are more valuable than the lower left coefficients.
The 16-bit values are then "quantized" (division in plain english).
The compression process defines an 8x8 quantization matrix. Divide the corresponding entry in the DCT coefficients by the value in the quantization matrix. Because this is integer division, the small values will go to zero. Long runs of zero values are combined using run-length compression. The more consecutive zeros you get, the better the compression.
Generally, the quantization values are much higher at the lower left than in the upper right. You try to force these DCT coefficients to be zero unless they are very large.
This is where much of the loss (not all of it though) comes from in JPEG.
The trade off is to get as many zeros as you can without noticeably degrading the image.
The choice of quantization matrices is the major factor in compression. Most JPEG libraries present a "quality" setting to the user. This translates into the selection of a quantization matrices in the encoder. If someone could devise better quantization matrices, you could get better compression.
This book explains the JPEG process in plain English: