Search code examples
pythonopencvpdfimage-processingresolution

Resolution Manipulation for Template Matching in OpenCV


I am trying to use template matching to find an equation inside a given pdf document that is generated from LaTeX. When I use the code over here, I get only a very good matching when I crop the picture from the original page (converted to jpeg or png), however when I compile the equation code separately and generate an jpg/png output of it the matching goes wrong tremendously.

I believe the reason is relevant to the resolution, but since I am an amateur in this field, I cannot reasonably make the jpg generated out of standalone equation to have the same pixel structure of the jpg for the whole page. Here is the code that is copied (more or less) from the above-mentioned website of OpenCV, which is an implementation for python:

import cv2
from PIL import Image

img = cv2.imread('location of the original image', 0)
img2 = img.copy()
template = cv2.imread('location of the patch I look for',0)
w, h = template.shape[::-1]

# All the 6 methods for comparison in a list
methods = ['cv2.TM_CCOEFF', 'cv2.TM_CCOEFF_NORMED', 'cv2.TM_CCORR',
            'cv2.TM_CCORR_NORMED', 'cv2.TM_SQDIFF', 'cv2.TM_SQDIFF_NORMED']

method = eval(methods[0])

# Apply template Matching
res = cv2.matchTemplate(img,template,method)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
# If the method is TM_SQDIFF or TM_SQDIFF_NORMED, take minimum
if method in [cv2.TM_SQDIFF, cv2.TM_SQDIFF_NORMED]:
    top_left = min_loc
else:
    top_left = max_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
print top_left, bottom_right

img = Image.open('location of the original image')

#cropping the original image with the found coordinates to make a qualitative comparison
cropped = img.crop((top_left[0], top_left[1], bottom_right[0], bottom_right[1]))
cropped.save('location to save the cropped image using the coordinates found by template matching')

Here is a sample page that I look for the first equation: enter image description here

The code to generate a specific standalone equation is as follows:

\documentclass[preview]{standalone}
\usepackage{amsmath}
\begin{document}\begin{align*}
(\mu_1+\mu_2)(\emptyset) = \mu_1(\emptyset) + \mu_2(\emptyset) = 0 + 0 =0
\label{eq_0}
\end{align*}
\end{document}

Which I compile and later trim the white space around the equation either using pdfcrop or using .image() method in PythonMagick. Template matching generated with this trimmed output on the original page does not give a reasonable result. Here is the trimmed/converted output using pdfcrop/Mac's Preview.app:

enter image description here.

Cropping directly the equation from the above page works perfectly. I would appreciate some explanation and help.

EDIT: I also found the following which uses template matching by bruteforcing different possible scales: http://www.pyimagesearch.com/2015/01/26/multi-scale-template-matching-using-python-opencv/

However since I am willing to process as many as 1000 of documents, this seems a very slow method to go for. Plus I imagine there should be a more logical way of handling it, by somehow finding the relevant scales.


Solution

  • Instead of template matching you could use features, i.e. keypoints with descriptors. They are scale- invariant, so you do not need to iterate over different scaled versions of the image.

    The python example find_obj.py provieded with OpenCV works with ORB features for your given example.

    python find_obj.py --feature=brisk rB4Yy_big.jpg ZjBAA.jpg
    

    result

    Note that I did not use the cropped version of the formula to search for, but a version with some white pixels around it, so the keypoint detection can work correctly. There needs to be some space around it, because keypoints have to be completely inside the image. Otherwise the descriptors can not be calculated.

    padded formula

    The big image is the original from your post.

    One additional remark: You will always get some matches. If the formula image you are searching for is not present in the big images, the matches will be nonsensical. If you need to sort out these false positives, you have the following options:


    Edit: Since you asked for it, here is a version that draws the bounding box around the found formula instead of the matches:

    #!/usr/bin/env python
    
    # Python 2/3 compatibility
    from __future__ import print_function
    
    import numpy as np
    import cv2
    
    def init_feature():
        detector = cv2.BRISK_create()
        norm = cv2.NORM_HAMMING
        matcher = cv2.BFMatcher(norm)
        return detector, matcher
    
    def filter_matches(kp1, kp2, matches, ratio = 0.75):
        mkp1, mkp2 = [], []
        for m in matches:
            if len(m) == 2 and m[0].distance < m[1].distance * ratio:
                m = m[0]
                mkp1.append( kp1[m.queryIdx] )
                mkp2.append( kp2[m.trainIdx] )
        p1 = np.float32([kp.pt for kp in mkp1])
        p2 = np.float32([kp.pt for kp in mkp2])
        kp_pairs = zip(mkp1, mkp2)
        return p1, p2, kp_pairs
    
    def explore_match(win, img1, img2, kp_pairs, status = None, H = None):
        h1, w1 = img1.shape[:2]
        h2, w2 = img2.shape[:2]
        vis = np.zeros((max(h1, h2), w1+w2), np.uint8)
        vis[:h1, :w1] = img1
        vis[:h2, w1:w1+w2] = img2
        vis = cv2.cvtColor(vis, cv2.COLOR_GRAY2BGR)
    
        if H is not None:
            corners = np.float32([[0, 0], [w1, 0], [w1, h1], [0, h1]])
            corners = np.int32( cv2.perspectiveTransform(corners.reshape(1, -1, 2), H).reshape(-1, 2) + (w1, 0) )
            cv2.polylines(vis, [corners], True, (0, 0, 255))
    
        cv2.imshow(win, vis)
        return vis
    
    if __name__ == '__main__':
    
        img1 = cv2.imread('rB4Yy_big.jpg' , 0)
        img2 = cv2.imread('ZjBAA.jpg', 0)
        detector, matcher = init_feature()
    
        kp1, desc1 = detector.detectAndCompute(img1, None)
        kp2, desc2 = detector.detectAndCompute(img2, None)
    
        raw_matches = matcher.knnMatch(desc1, trainDescriptors = desc2, k = 2)
        p1, p2, kp_pairs = filter_matches(kp1, kp2, raw_matches)
        if len(p1) >= 4:
            H, status = cv2.findHomography(p1, p2, cv2.RANSAC, 5.0)
            print('%d / %d  inliers/matched' % (np.sum(status), len(status)))
            vis = explore_match('find_obj', img1, img2, kp_pairs, status, H)
            cv2.waitKey()
            cv2.destroyAllWindows()
        else:
            print('%d matches found, not enough for homography estimation' % len(p1))
    

    bounding box