Search code examples
pythonopencvcomputer-visionconv-neural-networkconvolution

Write a convolution code, but returns me a "Mosaic" image. Can anyone see what is wrong with my code?


I want to implement the convolution using numpy and python by myself: enter image description here

This is my code:

def add_padding_to_image(img, kernel):
    old_image_height, old_image_width, channels = img.shape

    padding_width = kernel.shape[0] // 2

    new_image_width = img.shape[1] + padding_width * 2
    new_image_height = img.shape[0] + padding_width * 2
    color = (0,0,0)
    result = np.full((new_image_height,new_image_width, channels), color, dtype=np.uint8)

    # compute center offset
    x_center = (new_image_width - old_image_width) // 2
    y_center = (new_image_height - old_image_height) // 2

    # copy img image into center of result image
    result[y_center:y_center+old_image_height, 
        x_center:x_center+old_image_width] = img
    
    return result


def convolve(img, kernel):
    height, width, color = img.shape
    kernel_h, kernel_w= kernel.shape
    img = img.copy()

    # loop through the image, for each pixel on the image
    for y in range(kernel_h//2, height - kernel_h // 2):
        y_start = y - kernel_h // 2
        y_end = y + kernel_h // 2

        for x in range(kernel_w//2, width- kernel_w // 2):
            # for each pixel on the image
            x_start = x - kernel_w// 2
            x_end = x + kernel_w // 2

            # get ready for loop through the image pixel * kernel
            kx = 0
            ky = 0

            b_sum = 0
            g_sum = 0
            r_sum = 0

            # loop through the neighbor of the image pixel
            for i in range(y_start, y_end + 1):
                for j in range(x_start, x_end + 1):


                    # print("i", i, "j", j, "kx", kx, "ky", ky )
                
                    # loop through each neightbor image pixel           
                    color = img[i][j]

                    # add to the sum 
                    b_sum = b_sum + color[0] * kernel[ky][kx]   
                    g_sum = g_sum + color[1] * kernel[ky][kx]
                    r_sum = r_sum + color[2] * kernel[ky][kx]
                    # move to kernel grid
                    kx = (kx+1)% kernel_w
                ky = (ky+1)% kernel_h
    
            img[y][x] = [b_sum, g_sum, r_sum]

                
    return img


# read image
img = cv.imread('cat.jpg')
outline = np.array([
    [-1, -1, -1],
    [-1,  8, -1],
    [-1, -1, -1]
])
img_padding = add_padding_to_image(img, outline)
filter_image = convolve(img_padding, outline)

# # Show the image
cv.imshow("Image", img)
cv.imshow("Filter Image", filter_image)
cv.waitKey(0)
cv.destroyAllWindows()

add_padding_to_image will add borders around the image. convolve is the one doing the convolution. I am expecting to give me the same result as using cv.filter2D, for example, if using:

outline = np.array([
    [-1, -1, -1],
    [-1,  8, -1],
    [-1, -1, -1]
])

dst = cv.filter2D(img, -1, outline)

The cat image will be converted from:

enter image description here

to: enter image description here

But my code gives me this:

enter image description here

I have been stuck for days and don't know where did I lack of. Can anyone look around it?

Thanks!


Solution

  • Firstly on line img[y][x] = [b_sum, g_sum, r_sum] you write the pixels that you are reading again on the next iterations of your loop.

    To fix that, change your img = img.copy() at the top of convolute() to returned = img.copy(), and return returned instead of return img at the bottom of it.

    Secondly (and lastly) clamp your pixel values between 0-255, to not overflow uint8. Like this:

    b_sum = 255 if b_sum>255 else b_sum
    g_sum = 255 if g_sum>255 else g_sum
    r_sum = 255 if r_sum>255 else r_sum
    b_sum = 0 if b_sum<0 else b_sum
    g_sum = 0 if g_sum<0 else g_sum
    r_sum = 0 if r_sum<0 else r_sum
    
    returned[y][x] = [b_sum, g_sum, r_sum]