Search code examples
imageopencvimage-processingimagemagick

Easiest way to find an overlap of two images (screenshots)


I would like to create a utility that joins two screenshots to make them into one. I already seen web sites and apps that do this, so no need to suggest those 😀

Example:

  1. Take screenshot in iMessage.
  2. Scroll half screen down.
  3. Take another screenshot.

I would like to have combined screenshot that is roughly 1.5 screens tall.

To do this I need to find common parts at top and bottom and "samish" overlapping area. It is not precise match due to gradients in some apps while scrolling.

I see a lot of recommendations to use OpenCV but that seems for panorama photos. My case seems much simpler (e.g. images are same width and such).

Is there an easier option? I looked through ImageMagick but did not find anything that would provide me with the overlap area.

Example images:

enter image description here enter image description here

I would like to build a tool that combines them into one that shows a taller window with the entire page.


Solution

  • We may use "simple search" technique in "sliding window" style.
    Crop a window of say 5 rows from the bottom image, and search the matching position of the rows in the top image.

    Since the values of the pixels in overlapping part are almost equal, we may rely on sum of squared difference.

    Illustration:

                                                  Bottom half of top image 
                                              |   ########################################
    5 rows from bottom image                  |   ########################################
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  |   ########################################
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  |   ########################################
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  |   ########################################
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  |   ########################################
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  |   ########################################
    Compute sum of absolute difference        |   ########################################
    Slide down                                |   ########################################
                                              V   ########################################
    

    Note:
    The "sum of squared difference" is a simple form of correlation.
    In most cases, we assume possible gain and offset differences between the images, so we have to apply more complex formula as described in Wikipedia.
    Since we don't have gain/offset differences, we may use "sum of squared difference" or "sum of absolute difference".

    • Crop a "window" of 5 rows from the top of the bottom image:
        win = bot[0:5, :, :].astype(np.int32)
    
    • Slide the "window" down on the bottom part of the top image, and compute sum of squared difference:
        beg_y = top.shape[0]//2
        sqdif_arr = np.zeros(beg_y-2, int)
        for y in range(beg_y, top.shape[0]-2):
            dif = top[y-2:y+3, :, :].astype(np.int32) - win
            sum_sqdif = np.sum(dif*dif, dtype=np.int64)
            sqdif_arr[y-beg_y] = sum_sqdif
    
    • Find the index with minimum value:
        y = sqdif_arr.argmin() + beg_y
    

    Note:
    For making the solution simpler, I cropped the irrelevant (top) rows from the bottom image, and few columns from each side.

    Code sample:

    import numpy as np
    import cv2
    from matplotlib import pyplot as plt
    
    top = cv2.imread('top.png')
    bot = cv2.imread('bottom.png')
    
    # Crop the relevant part - in order to make the solution simpler.
    top = top[:, 30:890, :]
    bot = bot[90:, 30:890, :]
    
    win = bot[0:5, :, :].astype(np.int32)
    
    beg_y = top.shape[0]//2
    sqdif_arr = np.zeros(beg_y-2, int)
    
    for y in range(beg_y, top.shape[0]-2):
        dif = top[y-2:y+3, :, :].astype(np.int32) - win
        sum_sqdif = np.sum(dif*dif, dtype=np.int64)
        sqdif_arr[y-beg_y] = sum_sqdif
    
    y = sqdif_arr.argmin() + beg_y  # Get the index with minimum value (add beg_y offset).
    
    # Concatenat top and bottom images in the position we found.
    top = top[0:y-2, :, :]
    top_bot = np.vstack((top, bot))
    
    plt.plot(sqdif_arr)  # Show graph for testing
    plt.show(block=False)
    
    cv2.imshow('win', win.astype(np.uint8))  # Show win image for testing
    cv2.imshow('top_bot', top_bot)  # Show concatenated image for testing
    cv2.waitKey()
    cv2.destroyAllWindows()
    

    Plot of sqdif_arr (index with minimum value is 192):
    enter image description here

    win:
    enter image description here

    top_bot (Concatenated top and bottom images):
    enter image description here


    Notes:

    • There are cases that are not handled here - it's not going to work if all pixels in win are white (for example).
    • There are more efficient way for implementing sliding window than computing the difference for all the pixels every iteration.
    • 5 rows is just an example, you may select 101 rows (for example), for improving robustness.