I'm trying to classify images using an Artificial Neural Network and the approach I want to try is:
I'm using OpenCV3 and Python for this.
I'm relatively new to Machine Learning and I have the following question -
Each image that I analyse will have different number of 'keypoints' and hence different dimensions of the 2D 'descriptor' array. How do I decide the input for my ANN. For example for one sample image the descriptor shape is (12211, 128) so do I flatten this array and use it as an input, in which case I have to worry about varying input sizes for each image, or do I compute something else for the input?
I'm not sure if this is an exact solution but this worked for me. The main idea is as follows:
The supporting code roughly is given below (the function "pre_process_image"):
def tiles(arr, nrows, ncols):
"""
If arr is a 2D array, the returned list contains nrowsXncols numpy arrays
with each array preserving the "physical" layout of arr.
When the array shape (rows, cols) are not divisible by (nrows, ncols) then
some of the array dimensions can change according to numpy.array_split.
"""
rows, cols, channel = arr.shape
col_arr = np.array_split(range(cols), ncols)
row_arr = np.array_split(range(rows), nrows)
return [arr[r[0]: r[-1]+1, c[0]: c[-1]+1]
for r, c in product(row_arr, col_arr)]
def pre_process_images(data, dimensions=(28, 28)):
images = data['image']
features = []
count = 1
nrows = dimensions[0]
ncols = dimensions[1]
sift = cv2.xfeatures2d.SIFT_create(1)
for arr in images:
image_feature = []
cut_image = tiles(arr, nrows, ncols)
for small_image in cut_image:
(kps, descs) = sift.detectAndCompute(im, None)
image_feature.append(descs.flatten())
features.append(image_feature)
print count
count += 1
data['sift_features'] = features
return data
However this is extremely slow. I'm working on a way to optimally select features using PCA right now for the same.