Search code examples
pythonopencvnumpyfeature-detection

OpenCV, Python: How to use mask parameter in ORB feature detector


By reading a few answers on stackoverflow, I've learned this much so far:

The mask has to be a numpy array (which has the same shape as the image) with data type CV_8UC1 and have values from 0 to 255.

What is the meaning of these numbers, though? Is it that any pixels with a corresponding mask value of zero will be ignored in the detection process and any pixels with a mask value of 255 will be used? What about the values in between?

Also, how do I initialize a numpy array with data type CV_8UC1 in python? Can I just use dtype=cv2.CV_8UC1

Here is the code I am using currently, based on the assumptions I'm making above. But the issue is that I don't get any keypoints when I run detectAndCompute for either image. I have a feeling it might be because the mask isn't the correct data type. If I'm right about that, how do I correct it?

# convert images to grayscale
base_gray = cv2.cvtColor(self.base, cv2.COLOR_BGRA2GRAY)
curr_gray = cv2.cvtColor(self.curr, cv2.COLOR_BGRA2GRAY)

# initialize feature detector
detector = cv2.ORB_create()

# create a mask using the alpha channel of the original image--don't
# use transparent or partially transparent parts
base_cond = self.base[:,:,3] == 255
base_mask = np.array(np.where(base_cond, 255, 0))

curr_cond = self.base[:,:,3] == 255
curr_mask = np.array(np.where(curr_cond, 255, 0), dtype=np.uint8)

# use the mask and grayscale images to detect good features
base_keys, base_desc = detector.detectAndCompute(base_gray, mask=base_mask)
curr_keys, curr_desc = detector.detectAndCompute(curr_gray, mask=curr_mask)

 print("base keys: ", base_keys)
 # []
 print("curr keys: ", curr_keys)
 # []

Solution

  • So here is most, if not all, of the answer:

    What is the meaning of those numbers

    0 means to ignore the pixel and 255 means to use it. I'm still unclear on the values in between, but I don't think all nonzero values are considered "equivalent" to 255 in the mask. See here.

    Also, how do I initialize a numpy array with data type CV_8UC1 in python?

    The type CV_8U is the unsigned 8-bit integer, which, using numpy, is numpy.uint8. The C1 postfix means that the array is 1-channel, instead of 3-channel for color images and 4-channel for rgba images. So, to create a 1-channel array of unsigned 8-bit integers:

    import numpy as np
    np.zeros((480, 720), dtype=np.uint8)
    

    (a three-channel array would have shape (480, 720, 3), four-channel (480, 720, 4), etc.) This mask would cause the detector and extractor to ignore the entire image, though, since it's all zeros.

    how do I correct [the code]?

    There were two separate issues, each separately causing each keypoint array to be empty.

    First, I forgot to set the type for the base_mask

    base_mask = np.array(np.where(base_cond, 255, 0)) # wrong
    base_mask = np.array(np.where(base_cond, 255, 0), dtype=uint8) # right
    

    Second, I used the wrong image to generate my curr_cond array:

    curr_cond = self.base[:,:,3] == 255 # wrong
    curr_cond = self.curr[:,:,3] == 255 # right
    

    Some pretty dumb mistakes.

    Here is the full corrected code:

    # convert images to grayscale
    base_gray = cv2.cvtColor(self.base, cv2.COLOR_BGRA2GRAY)
    curr_gray = cv2.cvtColor(self.curr, cv2.COLOR_BGRA2GRAY)
    
    # initialize feature detector
    detector = cv2.ORB_create()
    
    # create a mask using the alpha channel of the original image--don't
    # use transparent or partially transparent parts
    base_cond = self.base[:,:,3] == 255
    base_mask = np.array(np.where(base_cond, 255, 0), dtype=np.uint8)
    
    curr_cond = self.curr[:,:,3] == 255
    curr_mask = np.array(np.where(curr_cond, 255, 0), dtype=np.uint8)
    
    # use the mask and grayscale images to detect good features
    base_keys, base_desc = detector.detectAndCompute(base_gray, mask=base_mask)
    curr_keys, curr_desc = detector.detectAndCompute(curr_gray, mask=curr_mask)
    

    TL;DR: The mask parameter is a 1-channel numpy array with the same shape as the grayscale image in which you are trying to find features (if image shape is (480, 720), so is mask).

    The values in the array are of type np.uint8, 255 means "use this pixel" and 0 means "don't"

    Thanks to Dan Mašek for leading me to parts of this answer.