Speeding up square sample summing over image

I am attempting to create a 2D array from another, by taking the sum of NxN pixels around a point on the image, and saving the result at the same coordinate in the new image:

def sum_black(image: np.ndarray, size=11) -> np.ndarray:
    assert(size % 2)
    pad = (size - 1) // 2
    iH, iW = image.shape[:2]
    image = ((255 - image) / 255).astype(np.float32)
    image = cv.copyMakeBorder(image, pad, pad, pad, pad, cv.BORDER_CONSTANT, None, 0)
    output = np.zeros((iH, iW), dtype="float32")

    for y in range(iH):
        for x in range(iW):
            output[y, x] = image[y-pad:y+pad, x-pad:x+pad].sum()

    output -= np.min(output)

    return 255 - (output / np.max(output)) * 255

However, looping over the input image and taking the sum around a point seems to be very slow on larger images. Is there any convenient method to speed this up, or do I have to implement concurrency with threads?

Solution

what you are trying to get is the sum of a sliding window. you could do this with numpy by convolving the 2d array using a NxN matrix of 1s, but since you are using opencv i'll share the opencv method since it is faster.

cv2.boxFilter(image, -1, (size,size), normalize=False, borderType=cv2.BORDER_CONSTANT)

boxfilter convolves the image with a sizeXsize matrix of ones, which is usually normalized by dividing by size*size which gives you the average. We want the sum so we set normalize=False. borderType constant tells opencv to pad the edges of the image with a constant value (0 in this case)