python image opencv computer-vision feature-detection

OpenCV, Python: How to stitch two images of different sizes and transparent backgrounds

I've been working on a project where I stitch together images from a drone flying in a lawn mower pattern. I am able to stitch together images from a single pass (thanks to many answers on stackoverflow) but when I try to stitch two separate passes together laterally, the transformation my method produces is nonsensical. Here are the two images I am trying to stitch:

And here is the code that I've been using to estimate a homography between the two, base and curr.

base_gray = cv2.cvtColor(base, cv2.COLOR_BGRA2GRAY)
curr_gray = cv2.cvtColor(curr, cv2.COLOR_BGRA2GRAY)

detector = cv2.ORB_create()
base_keys, base_desc = detector.detectAndCompute(base_gray, None)
curr_keys, curr_desc = detector.detectAndCompute(curr_gray, None)

FLANN_INDEX_LSH = 6
flann_params = dict(algorithm = FLANN_INDEX_LSH,
                    table_number = 12,
                    key_size = 20,
                    multi_probe_level = 2)
search_params = dict(checks=100)
matcher = cv2.FlannBasedMatcher(flann_params, search_params)
matches = matcher.match(base_desc, curr_desc)

max_dist = 0.0
min_dist = 100.0

for match in matches:
    dist = match.distance
    min_dist = dist if dist < min_dist else min_dist
    max_dist = dist if dist > max_dist else max_dist

good_matches = [match for match in matches if match.distance <= 10 * min_dist ]

base_matches = []
curr_matches = []
for match in good_matches:
    base_matches.append(base_keys[match.queryIdx].pt)
    curr_matches.append(curr_keys[match.trainIdx].pt)

bm_final = np.asarray(base_matches)
cm_final = np.asarray(curr_matches)

# find perspective transformation using the arrays of corresponding points
transformation, hom_stati = cv2.findHomography(cm_final, bm_final, method=cv2.RANSAC, ransacReprojThreshold=1)

As I said, it doesn't work. Is it because the transparent backgrounds are messing with the calculation?

Solution

I think Flann is probably not what you want to use for matching here. First, indeed, since you're converting to grayscale, the black spots, the edges of the images, etc will likely be included in your feature set, which you do not want. Secondly, Flann uses methods to build specific descriptors for fast searching through an image database; it is used for CBIR, not for homography estimation.

Instead, just take a normal approach with SIFT or SURF or ORB or BRISK. Note that all of those allow to add a mask for their keypoint detection step, so that you can create a mask from the alpha channel to ignore keypoints in. See the OpenCV docs for SIFT and SURF and for ORB and BRISK for more.