SSIM is large but the two images are not similar at all

I want to use the structural similarity index measure for computing the mean structural similarity index between two images: the original one and the reconstructed one.

This is the original one:


While this is the reconstructed one:


Unfortunately if I run this script:

import numpy as np
from skimage.metrics import structural_similarity as ssim

# Both original and reconstructed have shape (40, 40, 1)
# dtype is 'float32'
original = np.load("original.npy")
reconstructed = np.load("reconstructed.npy")

ssim(original, reconstructed, data_range=1, channel_axis=-1)  # 0.9321383

You can notice that the value is very large, around 0.93, but the reconstructed image is not similar at all to the original one!

What am I missing? Note that scikit-image version is scikit-image~=0.21.0.

What I tried 1: I think there is an issue with data_range=1 parameter since the documentation says that:

The data range of the input image (distance between minimum and maximum possible values). By default, this is estimated from the image data type. This estimate may be wrong for floating-point image data. Therefore it is recommended to always pass this value explicitly. If data_range is not specified, the range is automatically guessed based on the image data type. However for floating-point image data, this estimate yields a result double the value of the desired range, as the dtype_range in has defined intervals from -1 to +1. This yields an estimate of 2, instead of 1, which is most often required when working with image data (as negative light intentsities are nonsensical). In case of working with YCbCr-like color data, note that these ranges are different per channel (Cb and Cr have double the range of Y), so one cannot calculate a channel-averaged SSIM with a single call to this function, as identical ranges are assumed for each channel.

Does it make sense to set data_range=original.min()-original.max() instead of data_range=1? What do you think? In that way I would get ssim = 0.059593666.

If you need it there are some additional information:

original.min(), original.max()  # (0.7764345, 0.82100683)
reconstructed.min(), reconstructed.max() # (0.6095292, 0.65623206)

What I tried 2: I normalized both original and reconstructed images this way:

original = np.load("original.npy")
reconstructed = np.load("reconstructed.npy")

original = (original - np.min(original)) / (np.max(original) - np.min(original))
reconstructed = (reconstructed - np.min(reconstructed)) / (np.max(reconstructed) - np.min(reconstructed))

ssim(original, reconstructed, data_range=1, channel_axis=-1)  # 0.056520723

What do you think is the proper solution?


  • In your case, the two images having such a small range compared to the defined range (range=1) means that they both are more or less flat, they deviate little from a uniform gray value. And so they are structurally very similar. This means that, if you were to display these images using a range of 1 for the intensities (e.g. setting black at 0 and white at 1) then you would see little difference in these two images:

    plt.imshow(original, vmin=0, vmax=1)
    plt.imshow(reconstructed, vmin=0, vmax=1)

    Normalizing to a range of 1 and setting range=1 has exactly the same effect as setting range=original.min()-original.max(). Both cause the actual range of image intensities to match the range reported to the SSIM algorithm.

    Though I would probably scale reconstructed in the same way as original when normalizing:

    min, max = np.min(original), np.max(original)
    original = (original - min) / (max - min)
    reconstructed = (reconstructed - min) / (max - min)

    Just because this way the comparison between the two images is not affected by potentially different scaling. The different scaling and offset should have no bearing on the computed SSIM, but it might have a large impact later if/when you choose to change your comparison algorithm.