Histogram comparison between two images

I am new to Histogram comparisons.

This code uses these images to make a histogram comparison. The result was impressive with a 0.99 %, however I think that the result resulted in 99% because of the background color. Can someone tell me how can I ignore the white color and compare the actual fruit.

The following code was found here.

# Load the images
img1 = cv2.imread('D:/downloads/app1.jpg')
img2 = cv2.imread('D:/downloads/app2.jpg')

# Convert it to HSV
img1_hsv = cv2.cvtColor(img1, cv2.COLOR_BGR2HSV)
img2_hsv = cv2.cvtColor(img2, cv2.COLOR_BGR2HSV)

# Calculate the histogram and normalize it
hist_img1 = cv2.calcHist([img1_hsv], [0,1], None, [180,256], [0,180,0,256])
cv2.normalize(hist_img1, hist_img1, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX);
hist_img2 = cv2.calcHist([img2_hsv], [0,1], None, [180,256], [0,180,0,256])
cv2.normalize(hist_img2, hist_img2, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX);

# find the metric value
metric_val = cv2.compareHist(hist_img1, hist_img2, cv2.HISTCMP_BHATTACHARYYA)

Solution

Using some mask as Fred suggested seems to be the cleanest solution, but Fred's comment regarding the HSV color space is even more important here! But, first of all, the reported metric value of 0.99... (also in the linked article) was obtained using cv2.HISTCMP_CORREL, not using cv2.HISTCMP_BHATTACHARYYA!

Now, let's stick to OpenCV's common BGR color space, and adapt the code:

import cv2

# Load the images
img1 = cv2.imread('app1.png')
img2 = cv2.imread('app2.png')

# Calculate the histograms, and normalize them
hist_img1 = cv2.calcHist([img1], [0, 1, 2], None, [256, 256, 256], [0, 256, 0, 256, 0, 256])
cv2.normalize(hist_img1, hist_img1, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX)
hist_img2 = cv2.calcHist([img2], [0, 1, 2], None, [256, 256, 256], [0, 256, 0, 256, 0, 256])
cv2.normalize(hist_img2, hist_img2, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX)

# Find the metric value
metric_val = cv2.compareHist(hist_img1, hist_img2, cv2.HISTCMP_CORREL)
print(metric_val)
# 0.9995753648895891

The metric value is still something at 99.9 %.

So, now, let's ignore all white pixels by manually setting hist_imgx[255, 255, 255] = 0:

import cv2

# Load the images
img1 = cv2.imread('app1.png')
img2 = cv2.imread('app2.png')

# Calculate the histograms, set bin for (255, 255, 255) to 0, and normalize them
hist_img1 = cv2.calcHist([img1], [0, 1, 2], None, [256, 256, 256], [0, 256, 0, 256, 0, 256])
hist_img1[255, 255, 255] = 0
cv2.normalize(hist_img1, hist_img1, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX)
hist_img2 = cv2.calcHist([img2], [0, 1, 2], None, [256, 256, 256], [0, 256, 0, 256, 0, 256])
hist_img2[255, 255, 255] = 0
cv2.normalize(hist_img2, hist_img2, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX)

# Find the metric value
metric_val = cv2.compareHist(hist_img1, hist_img2, cv2.HISTCMP_CORREL)
print(metric_val)
# 0.6199666001215806

And, the metric value drops to 62 %!

So, your assumption seems to be correct, the white background distorts the whole histogram comparison.

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.16299-SP0
Python:        3.9.1
PyCharm:       2021.1.1
OpenCV:        4.5.1
----------------------------------------