jpegtran wipe: color vs grayscale image

Consider the following simple shell script (Debian/stable) system (*)

% ./demo.sh
26b955f9e6987c2617c97a653f1fe3dd  mask_white_gray.jpg
4fe7eb3d04c54887c1b0c9c33f8526bf  mask_gray.jpg

Looking at the intermedicate image generate, I can see that there is actually a difference. For instance:

$ compare mask_white_gray.jpg  mask_gray.jpg -compose src diff.png

mask_gray	mask_white_gray	diff

I do not understand why jpegtran would select a larger region for the color case compared to the grayscale one. The documentation simply states:

If the wipe region and the region outside the wipe region, when adjusted to the nearest iMCU boundary [...]

Could someone please help me understand what I am seeing ? I am the impression that the iMCU boundaries should have been identical in this case.

(*)

% cat demo.sh
#!/bin/sh
#set -x

convert -size 100x100 xc:white orig.ppm
cjpeg -outfile orig.jpg orig.ppm

jpegtran -copy none -outfile white.jpg orig.jpg
jpegtran -copy none -grayscale -outfile gray.jpg orig.jpg

jpegtran -wipe 50x50+0+0 -outfile mask_white.jpg white.jpg
jpegtran -wipe 50x50+0+0 -outfile mask_gray.jpg gray.jpg

jpegtran -grayscale -outfile mask_white_gray.jpg mask_white.jpg

md5sum mask_white_gray.jpg
md5sum mask_gray.jpg

and

% jpegtran -version
libjpeg-turbo version 2.1.5 (build 20230203)

Solution

Let's take a closer look at the outputs we're seeing:

mask_gray.jpg (a grayscale image from the start) shows a 56x56 gray square.
mask_white.jpg (color) and mask_white_gray.jpg (color to grayscale) both show 64x64 gray squares.

It’s clear that we’re not getting 50x50 squares in any of these cases.

This is caused by JPEG's internal structure, which is based on 8x8 base blocks. The jpegtran -wipe command operates at the block level rather than the pixel level. This explains why mask_gray.jpg has a 56x56 gray square (56 = 8 * 7).

For color images, sampling factors can come into play : JPEG allows a block to represent more than 8x8 pixels from the original image, depending on a subsampling factor specified per channel. To view these factors, we can use djpeg -verbose -fast white.jpg > /dev/null :

Component 1: 2hx2v q=0
Component 2: 1hx1v q=1
Component 3: 1hx1v q=1

Here, 2hx2v indicates that the first color channel is using 2x subsampling, meaning each block actually represents a 16x16 pixel area. When calling jpegtran -wipe on a color image, we're visually creating gray squares of sizes multiple of 16 pixels, which explains the 64x64 gray square.

When converting the color image to grayscale, jpegtran can't regenerate the data erased in the 64x64 area, which results in the gray rectangle not changing size.

If you want to ensure you get the same output in both cases, you can modify the cjpeg command to force the color image to use 8x8 blocks with :

cjpeg -sample 1x1 -outfile orig.jpg orig.ppm

That will ensure both images give you the same 56x56 gray block.

However, as pointed out by another commenter, if you have a practical application in mind and want precise control on the size of the wiped area, you should probably be using ImageMagick instead.

If you want to dig deeper and take a closer look at what's going on under the hood, the do_wipe function of jpegtran is only 20 lines and quite readable.