image-processing python-imaging-library jpeg

Getting different RGB values after saving and re-loading an image via PIL in Ubuntu 18.04 vs any other OS

Whenever I load an image, save it with 90% quality, reload that saved image and then print the sum of it's RGB matrix, I get one value in Ubuntu 18.04.5, CentOS 8.2 and a different value in Ubuntu 20.04+, Fedora 33 and Windows 10. I have tested it with the same version of pillow/PIL, numpy and python in all the above mentioned operating systems but the result is the same.

img = Image.open('Sp_D_CNN_A_art0024_ani0032_0268.jpg')
np.sum(np.array(img))

OUTPUT : 28586794    (Same for all the OS)

img.save('temp.jpg', 'JPEG', quality = 90)
tempimg = Image.open('temp.jpg')
np.sum(np.array(tempimg))

OUTPUT : 28588237    (for Ubuntu 18.04.5 and CentOS 8.2.2004)
         28588547    (for Ubuntu 20.04+, Fedora 33, Windows 10 20H2)

Now, the difference here might look very slight but the problem is that after further processing by my Error Level Analysis algorithm the difference becomes huge and as I trained my segmentation model on Google Colab (which uses Ubuntu 18.04.5 in its runtime) the generated mask comes out to be very inaccurate in Ubuntu 20.04+, Fedora 33, Windows 10 20H2.

Why is that happening and how can I fix it?

Solution

The culprit behind it is the underlying libjpeg version used by the pillow (PIL) library. Although both, Ubuntu 18.04.5 and Ubuntu 20.04.1 has the same libjpeg package, the pillow version installed by default on Ubuntu 18.04.5 has it's own binaries. So a solution for this is to remove current pillow and reinstall it via downloading and building it from source so that it guaranteedly uses the system's libjpeg.

First remove current pillow module

python3 -m pip uninstall Pillow

Then just install the dependency packages listed here: https://pillow.readthedocs.io/en/stable/installation.html#building-on-linux

And then run:

python3 -m pip install Pillow --no-binary :all: