Search code examples
pythonopencvneural-networkclassificationsift

SIFT Input to ANN


I'm trying to classify images using an Artificial Neural Network and the approach I want to try is:

  1. Get feature descriptors (using SIFT for now)
  2. Classify using a Neural Network

I'm using OpenCV3 and Python for this.

I'm relatively new to Machine Learning and I have the following question -

Each image that I analyse will have different number of 'keypoints' and hence different dimensions of the 2D 'descriptor' array. How do I decide the input for my ANN. For example for one sample image the descriptor shape is (12211, 128) so do I flatten this array and use it as an input, in which case I have to worry about varying input sizes for each image, or do I compute something else for the input?


Solution

  • I'm not sure if this is an exact solution but this worked for me. The main idea is as follows:

    • Divide your image into a MxN grid.
    • Obtain a set number of feature points for each sub-image.
    • Concatenate the results for all the sub-images to obtain a feature vector for the entire image.

    The supporting code roughly is given below (the function "pre_process_image"):

    def tiles(arr, nrows, ncols):
        """
        If arr is a 2D array, the returned list contains nrowsXncols numpy arrays
        with each array preserving the "physical" layout of arr.
    
        When the array shape (rows, cols) are not divisible by (nrows, ncols) then
        some of the array dimensions can change according to numpy.array_split.
    
        """
        rows, cols, channel = arr.shape
        col_arr = np.array_split(range(cols), ncols)
        row_arr = np.array_split(range(rows), nrows)
        return [arr[r[0]: r[-1]+1, c[0]: c[-1]+1]
                         for r, c in product(row_arr, col_arr)]
    
    def pre_process_images(data, dimensions=(28, 28)):
        images = data['image']
        features = []
        count = 1
        nrows = dimensions[0]
        ncols = dimensions[1]
        sift = cv2.xfeatures2d.SIFT_create(1)
        for arr in images:
            image_feature = []
            cut_image = tiles(arr, nrows, ncols)
            for small_image in cut_image:
                (kps, descs) = sift.detectAndCompute(im, None)
                image_feature.append(descs.flatten())
            features.append(image_feature)
            print count
            count += 1
    
        data['sift_features'] = features
        return data
    

    However this is extremely slow. I'm working on a way to optimally select features using PCA right now for the same.