Search code examples
pythonnumpyopencvstride

NumPy template matching SQDIFF with `sliding window_view`


The SQDIFF is defined as openCV definition. (I believe they omit channels)

SQDIFF

Which in junior numpy Python should be

A = np.arange(27, dtype=np.float32)
A = A.reshape(3,3,3) # The "image"
B = np.ones([2, 2, 3], dtype=np.float32) # window
rw, rh = A.shape[0] - B.shape[0] + 1, A.shape[1] - B.shape[1] + 1 # End result size
result = np.zeros([rw, rh])
for i in range(rw):
    for j in range(rh):
        w = A[i:i + B.shape[0], j:j + B.shape[1]]
        res =  B - w
        result[i, j] = np.sum(
            res ** 2
        )
cv_result = cv.matchTemplate(A, B, cv.TM_SQDIFF) # this result is the same as the simple for loops
assert np.allclose(cv_result, result)

This is comparatively slow solution. I have read about sliding_window_view but cannot get it correct.

# This will fail with these large arrays but is ok for smaller ones
A = np.random.rand(1028, 1232, 3).astype(np.float32)
B = np.random.rand(248, 249, 3).astype(np.float32)
locations = np.lib.stride_tricks.sliding_window_view(A, B.shape)
sqdiff = np.sum((B - locations) ** 2, axis=(-1,-2, -3, -4)) # This will fail with normal sized images

will fail with MemoryError even if the result easily fits to memory. How can I produce similar results to the cv2.matchTemplate function with this faster way?


Solution

  • As a last resort, you may perform the computation in tiles, instead of computing "all at once".

    np.lib.stride_tricks.sliding_window_view returns a view of the data, so it doesn't consume a lot of RAM.

    The expression B - locations can't use a view, and requires the RAM for storing an array with shape (781, 984, 1, 248, 249, 3) of float elements.

    The total RAM for storing B - locations is 781*984*1*248*249*3*4 = 569,479,908,096 bytes.


    For avoiding the need for storing B - locations at the RAM at once, we may compute sqdiff in tiles, when "tile" computation requires less RAM.

    A simple tiles division is using every row as a tile - loop over the rows of sqdiff, and compute the output row by row.

    Example:

    sqdiff = np.zeros((locations.shape[0], locations.shape[1]), np.float32)  # Allocate an array for storing the result.
    
    # Compute sqdiff row by row instead of computing all at once.
    for i in range(sqdiff.shape[0]):
        sqdiff[i, :] = np.sum((B - locations[i, :, :, :, :, :]) ** 2, axis=(-1, -2, -3, -4))
    

    Executable code sample:

    import numpy as np
    import cv2
    
    A = np.random.rand(1028, 1232, 3).astype(np.float32)
    B = np.random.rand(248, 249, 3).astype(np.float32)
    locations = np.lib.stride_tricks.sliding_window_view(A, B.shape)
    
    cv_result = cv2.matchTemplate(A, B, cv2.TM_SQDIFF)  # this result is the same as the simple for loops
    
    #sqdiff = np.sum((B - locations) ** 2, axis=(-1, -2, -3, -4))  # This will fail with normal sized images
    
    sqdiff = np.zeros((locations.shape[0], locations.shape[1]), np.float32)  # Allocate an array for storing the result.
    
    # Compute sqdiff row by row instead of computing all at once.
    for i in range(sqdiff.shape[0]):
        sqdiff[i, :] = np.sum((B - locations[i, :, :, :, :, :]) ** 2, axis=(-1, -2, -3, -4))
    
    assert np.allclose(cv_result, sqdiff)
    

    I know the solution is a bit disappointing... But it is the only generic solution I could find.