image opencv image-processing imagemagick

Easiest way to find an overlap of two images (screenshots)

I would like to create a utility that joins two screenshots to make them into one. I already seen web sites and apps that do this, so no need to suggest those 😀

Example:

Take screenshot in iMessage.
Scroll half screen down.
Take another screenshot.

I would like to have combined screenshot that is roughly 1.5 screens tall.

To do this I need to find common parts at top and bottom and "samish" overlapping area. It is not precise match due to gradients in some apps while scrolling.

I see a lot of recommendations to use OpenCV but that seems for panorama photos. My case seems much simpler (e.g. images are same width and such).

Is there an easier option? I looked through ImageMagick but did not find anything that would provide me with the overlap area.

Example images:

I would like to build a tool that combines them into one that shows a taller window with the entire page.

Solution

We may use "simple search" technique in "sliding window" style.
Crop a window of say 5 rows from the bottom image, and search the matching position of the rows in the top image.

Since the values of the pixels in overlapping part are almost equal, we may rely on sum of squared difference.

Illustration:

                                              Bottom half of top image 
                                          |   ########################################
5 rows from bottom image                  |   ########################################
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  |   ########################################
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  |   ########################################
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  |   ########################################
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  |   ########################################
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  |   ########################################
Compute sum of absolute difference        |   ########################################
Slide down                                |   ########################################
                                          V   ########################################

Note:
The "sum of squared difference" is a simple form of correlation.
In most cases, we assume possible gain and offset differences between the images, so we have to apply more complex formula as described in Wikipedia.
Since we don't have gain/offset differences, we may use "sum of squared difference" or "sum of absolute difference".

Crop a "window" of 5 rows from the top of the bottom image:

    win = bot[0:5, :, :].astype(np.int32)

Slide the "window" down on the bottom part of the top image, and compute sum of squared difference:

    beg_y = top.shape[0]//2
    sqdif_arr = np.zeros(beg_y-2, int)
    for y in range(beg_y, top.shape[0]-2):
        dif = top[y-2:y+3, :, :].astype(np.int32) - win
        sum_sqdif = np.sum(dif*dif, dtype=np.int64)
        sqdif_arr[y-beg_y] = sum_sqdif

Find the index with minimum value:

    y = sqdif_arr.argmin() + beg_y

Note:
For making the solution simpler, I cropped the irrelevant (top) rows from the bottom image, and few columns from each side.

Code sample:

import numpy as np
import cv2
from matplotlib import pyplot as plt

top = cv2.imread('top.png')
bot = cv2.imread('bottom.png')

# Crop the relevant part - in order to make the solution simpler.
top = top[:, 30:890, :]
bot = bot[90:, 30:890, :]

win = bot[0:5, :, :].astype(np.int32)

beg_y = top.shape[0]//2
sqdif_arr = np.zeros(beg_y-2, int)

for y in range(beg_y, top.shape[0]-2):
    dif = top[y-2:y+3, :, :].astype(np.int32) - win
    sum_sqdif = np.sum(dif*dif, dtype=np.int64)
    sqdif_arr[y-beg_y] = sum_sqdif

y = sqdif_arr.argmin() + beg_y  # Get the index with minimum value (add beg_y offset).

# Concatenat top and bottom images in the position we found.
top = top[0:y-2, :, :]
top_bot = np.vstack((top, bot))

plt.plot(sqdif_arr)  # Show graph for testing
plt.show(block=False)

cv2.imshow('win', win.astype(np.uint8))  # Show win image for testing
cv2.imshow('top_bot', top_bot)  # Show concatenated image for testing
cv2.waitKey()
cv2.destroyAllWindows()

Plot of sqdif_arr (index with minimum value is 192):

win:

top_bot (Concatenated top and bottom images):

Notes:

There are cases that are not handled here - it's not going to work if all pixels in win are white (for example).
There are more efficient way for implementing sliding window than computing the difference for all the pixels every iteration.
5 rows is just an example, you may select 101 rows (for example), for improving robustness.