python image opencv machine-learning detection

Detecting circle-like shapes on binary images with lots of noise

I am trying to detect black and white soccer balls almost purely by using image pre-processing techniques with OpenCV (in Python). My idea is as follows;

Process the image (for example to a blurred binary photo)
Find multiple 'candidates' for the soccer ball (for example by contour detection)
Resize these candidates (for example to 48x48px) and input its pixel-corresponding boolean values (0 = black pixel, 1 = white pixel) in a very simple Neural Network which then outputs a confidence value for each candidate
Determine if soccer ball is present in a photo and most likely location of the ball

I'm stuck on finding the right candidates. Currently, this is my approach;

Step 1: The original image

Step 2: The blurred image (medianblur, kernel 7)

Step 3: Generated binary image A Generated binary image B

Then I use findContours to find contours on the binary images. If no candidates are found on binary image B (using a minimum and maximum boundary box threshold), findContours will run on binary image A (and candidates will be returned). If one or more candidates are found on binary image B, then original image will be re-blurred (with kernel 15) and binary image C will be used for finding the contours and returning the candidates. See: Generated binary image C

This is the code for generating those binary images:

def generateMask(imgOriginal, rgb, margin):
  lowerLimit = np.asarray(rgb)
  upperLimit = lowerLimit+margin

  # switch limits if margin is negative
  if(margin < 0):
    lowerLimit, upperLimit = upperLimit, lowerLimit

  mask = cv.inRange(imgOriginal, lowerLimit, upperLimit)

  return mask

# generates a set of six images with (combinations of) mask(s) applied
def applyMasks(imgOriginal, mask1, mask2):
  # applying both masks to original image
  singleAppliedMask1 = cv.bitwise_and(imgOriginal, imgOriginal, mask = mask1) #res3
  singleAppliedMask2 = cv.bitwise_and(imgOriginal, imgOriginal, mask = mask2) #res1

  # applying masks to overlap areas in single masked and original image
  doubleAppliedMaskOv1 = cv.bitwise_and(
    imgOriginal,
    singleAppliedMask1,
    mask = mask2
  ) #res4
  doubleAppliedMaskOv2 = cv.bitwise_and(
    imgOriginal,
    singleAppliedMask2,
    mask = mask1
  ) #res2

  # applying masks to joint areas in single masked and original image
  doubleAppliedMaskJoin1 = cv.bitwise_or(
    imgOriginal, 
    singleAppliedMask1, 
    mask = mask2
  ) #res7
  doubleAppliedMaskJoin2 = cv.bitwise_or(
    imgOriginal,
    singleAppliedMask2,
    mask = mask1
  ) #res6

  return (
    singleAppliedMask1, singleAppliedMask2,
    doubleAppliedMaskOv1, doubleAppliedMaskOv2,
    doubleAppliedMaskJoin1, doubleAppliedMaskJoin2
  )

def generateBinaries(appliedMasks):
  # variable names correspond to output variables in applyMasks()
  (sam1, sam2, damov1, damov2, damjo1, damjo2) = appliedMasks

  # generate thresholded images
  (_, sam1t) = cv.threshold(sam1, 0, 255, cv.THRESH_BINARY_INV)
  (_, sam1ti) = cv.threshold(sam1, 0, 255, cv.THRESH_BINARY_INV)
  (_, sam2t) = cv.threshold(sam2, 0, 255, cv.THRESH_BINARY)
  (_, sam2ti) = cv.threshold(sam2, 0, 255, cv.THRESH_BINARY_INV)

  (_, damov1t) = cv.threshold(damov1, 0, 255, cv.THRESH_BINARY)
  (_, damov2t) = cv.threshold(damov2, 0, 255, cv.THRESH_BINARY_INV)

  (_, damjo1t) = cv.threshold(damjo1, 0, 255, cv.THRESH_BINARY_INV)
  (_, damjo2t) = cv.threshold(damjo2, 0, 255, cv.THRESH_BINARY)

  # return differences in binary images
  return ((damov2t-sam2t), (sam1t-damov1t), (sam2ti-damjo2t))

The result in this example image is good and very useful, even though it looks pretty wrong: see result.

It is very easy to get the result of this example image much better (for example, having only one or two candidates returned which includes a perfect bounding box for the soccer ball), however, after extensive parameter-tweaking the parameters I used in this example seem to produce the best overall recall.

However, I'm very stuck on certain photos of which I will show the original images, the binary A and B images (generated based on the original image median blurred with kernel 7) and the binary C image (kernel 15). Currently my approach returns an average of about 15 candidates per photo of which, for 25% of the photos, at least a perfect bounding box of the ball is included, and for about 75% of the photos, at least a bounding box is included which is partially correct (e.g. having a piece of the ball in the bounding box, or just being a piece of the ball itself).

Original images + binary images A

Binary images B + binary images C

(I could only post up to 8 links)

I hope you guys could give my some suggestions on how to proceed.

Solution

There are a lots of possibility on how to do this. Probably using neural network is a good choice, but you still need to understand and train one of them for your task.

You can use thresholding and gaussian blurring, and as a suggestion I can add using normalized cross correlation for template matching. Basically you take a template (an image of the ball, in your case, or even better, a set of images at different sizes, since ball may have varying size based on the position).

Then you iterate on the image and check when the template is matching. Of course this won't work on images with occlusion, but it may help getting some candidates.

More details about the mentioned process in the paper here (https://ieeexplore.ieee.org/document/5375779) or slides here (http://www.cse.psu.edu/~rtc12/CSE486/lecture07.pdf).

I wrote a small snippet of code to show you the idea. Just cropped the ball from the image (so I cheated, but it is just to show the idea). It also uses only the differnece between ball and image, while a more sophisticated measure (like NCC) would be better, but as said, is an example.

<-- the ball cropped

import matplotlib.pyplot as plt
import numpy as np
import pdb
import cv2

def rgb2gray(rgb):

    r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
    gray = 0.2989 * r + 0.5870 * g + 0.1140 * b

    return gray

if __name__ == "__main__":

    ball = plt.imread('ball.jpg');
    ball = rgb2gray(ball);
    findtheballcol = plt.imread('findtheball.jpg');
    findtheball = rgb2gray(findtheballcol)
    matching_img = np.zeros((findtheball.shape[0], findtheball.shape[1]));

    #METHOD 1
    width = ball.shape[1]
    height = ball.shape[0]
    for i in range(ball.shape[0], findtheball.shape[0]-ball.shape[0]):
        for j in range(ball.shape[1], findtheball.shape[1]-ball.shape[1]):


            # here use NCC or something better
            matching_score = np.abs(ball - findtheball[i:i+ball.shape[0], j:j+ball.shape[1]]);
            # inverting so that max is what we are looking for
            matching_img[i,j] = 1 / np.sum(matching_score);


    plt.subplot(221);
    plt.imshow(findtheball); 
    plt.title('Image')
    plt.subplot(222);
    plt.imshow(matching_img, cmap='jet');
    plt.title('Matching Score')
    plt.subplot(223);
    #pick a threshold
    threshold_val = np.mean(matching_img) * 2; #np.max(matching_img - (np.mean(matching_img)))
    found_at = np.where(matching_img > threshold_val)
    show_match = np.zeros_like(findtheball)
    for l in range(len(found_at[0])):
        yb = round(found_at[0][l]-height/2).astype(int)
        yt = round(found_at[0][l]+height/2).astype(int)
        xl = round(found_at[1][l]-width/2).astype(int)
        xr = round(found_at[1][l]+width/2).astype(int)
        show_match[yb: yt, xl: xr] = 1;
    plt.imshow(show_match)
    plt.title('Candidates')
    plt.subplot(224)
    # higher threshold
    threshold_val = np.mean(matching_img) * 3; #np.max(matching_img - (np.mean(matching_img)))
    found_at = np.where(matching_img > threshold_val)
    show_match = np.zeros_like(findtheball)
    for l in range(len(found_at[0])):
        yb = round(found_at[0][l]-height/2).astype(int)
        yt = round(found_at[0][l]+height/2).astype(int)
        xl = round(found_at[1][l]-width/2).astype(int)
        xr = round(found_at[1][l]+width/2).astype(int)
        show_match[yb: yt, xl: xr] = 1;
    plt.imshow(show_match)
    plt.title('Best Candidate')
    plt.show()

Have fun!