python numpy vectorization array-broadcasting sliding-window

How to call a function on the slices of a vectorized sliding window?

I'm trying to vectorize a sliding window search for object detection. So far I have been able to use numpy broadcasting to slice my main image into window sized slices that I have stored in the variable all_windows seen below. I have verified that the actual values match so I'm happy with it up to that point.

The next part is where I'm having trouble. I'd like to index into the all_windows array as I call the patchCleanNPredict() function so that I can pass each window into the function in a similarly vectorized format.

I was trying to create an array called new_indx that would contain the slice indices in a 2d array, e.g. ([0,0], [1,0], [2,0]...) but have been running into problems.

I'm hoping to end up with an array of confidence values for each window. The code below works in python 3.5. Thanks in advance for any help/advice.

import numpy as np

def patchCleanNPredict(patch):
    # patch = cv2.resize()# shrink patches with opencv resize function
    patch = np.resize(patch.flatten(),(1,np.shape(patch.flatten())[0])) # flatten the patch
    print('patch: ',patch.shape) 
    # confidence = predict(patch) # fake function showing prediction intent
    return # confidence


window = (30,46)# window dimensions
strideY = 10
strideX = 10

img = np.random.randint(0,245,(640,480)) # image that is being sliced by the windows

indx = np.arange(0,img.shape[0]-window[1],strideY)[:,None]+np.arange(window[1])
vertical_windows = img[indx]
print(vertical_windows.shape) # returns (60,46,480)


vertical_windows = np.transpose(vertical_windows,(0,2,1))
indx = np.arange(0,vertical_windows.shape[1]-window[0],strideX)[:,None]+np.arange(window[0])
all_windows = vertical_windows[0:vertical_windows.shape[0],indx]
all_windows = np.transpose(all_windows,(1,0,3,2))

print(all_windows.shape) # returns (45,60,46,30)


data_patch_size = (int(window[0]/2),int(window[1]/2)) # size the windows will be shrunk to

single_patch = all_windows[0,0,:,:]
patchCleanNPredict(single_patch) # prints the flattened patch size (1,1380)

new_indx = (1,1) # should this be an array of indices? 
patchCleanNPredict(all_windows[new_indx,:,:]) ## this is where I'm having trouble

Solution

To evaluate a function on all of the windows in a vectorized manner I ended up having to do a good amount of resizing and rearranging with np.transpose to get it all to broadcast correctly. The code below works and has for loops to display and confirm that the image windows haven't been garbled/mixed up. They would be deleted/commented for full speed runs.

A small disclaimer: I figure there must be cleaner implementations of sliding windows across 2D matrices, but since I wasn't able to find any the example below might help others. Also some of the frequent rearranging and resizing could probably be cleaned up with a more thorough understanding of broadcasting syntax.

import numpy as np
import cv2


def Predict(flattened_patches):
    # taking the mean of the flattened windows and then returning the
    # index of the row (window) with the highest mean, a predicter would have the same syntax
    results = flattened_patches.mean(1) 
    max_index = results.argmax() 
    return results, max_index

## -------- image and sliding window setup -------------------------
AR = 1.45 # choose an aspect ratio to maintain throughout scaling steps
win_h = 200 # window height
win_w = int(win_h/AR) # window width
window = (win_w,win_h)# window dimensions
strideY = 100
strideX = 100

data_patch_size = (30,46) # size the windows will be shrunk to for object detection

img = cv2.imread('picture6.png') # load an image to slide over

cv2.namedWindow('image',cv2.WINDOW_NORMAL) 
cv2.resizeWindow("image",int(img.shape[1]/2),int(img.shape[0]/2)) # shrink the image viewing window if you have large images

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
## -------- end of, image and sliding window setup --------------------

## -------- sliding window vectorization steps --------------------------
num_vert_windows = len(np.arange(0,img.shape[0]-window[1],strideY)) # number of vertical windows that will be created
indx = np.arange(0,img.shape[0]-window[1],strideY)[:,None]+np.arange(window[1]) # index that will be broadcasted across image
vertical_windows = img[indx] # array of windows win_h tall and the full width of the image

vertical_windows = np.transpose(vertical_windows,(0,2,1)) # transpose to prep for broadcasting
num_horz_windows = len(np.arange(0,vertical_windows.shape[1]-window[0],strideX)) # number of horizontal windows that will be created
indx = np.arange(0,vertical_windows.shape[1]-window[0],strideX)[:,None]+np.arange(window[0]) # index for broadcasting across vertical windows
all_windows = vertical_windows[0:vertical_windows.shape[0],indx] # array of all the windows
## -------- end of, sliding window vectorization ------------------------

## ------- The below code rearranges and flattens the windows into a single matrix of pixels in columns and each window
## ------- in a row which makes evaluating a function over every window in a vectorized manner easier

total_windows = num_vert_windows*num_horz_windows

all_windows = np.transpose(all_windows,(3,2,1,0)) # rearrange for resizing and intuitive indexing

print('all_windows shape as stored in 2d matrix:', all_windows.shape)
for i in range(all_windows.shape[2]): # display windows for visual confirmation
    for j in range(all_windows.shape[3]):
        cv2.imshow('image',all_windows[:,:,i,j])
        cv2.waitKey(100)

all_windows = np.resize(all_windows,(win_h,win_w,total_windows))
print('all_windows shape after folding into 1d vector:', all_windows.shape)
for i in range(all_windows.shape[2]): # display windows for visual confirmation
    cv2.imshow('image',all_windows[:,:,i])
    cv2.waitKey(100)

# shrinking all the windows down to the size needed for object detect predictions
small_windows = cv2.resize(all_windows[:,:,0:all_windows.shape[2]],data_patch_size,0,0,cv2.INTER_AREA)
print('all_windows shape after shrinking to evaluation size:',small_windows.shape)
for i in range(small_windows.shape[2]): # display windows for vis. conf.
    cv2.imshow('image',small_windows[:,:,i])
    cv2.waitKey(100)

# flattening and rearranging the window data so that the pixels are in columns and each window is a row
flat_windows = np.resize(small_windows,(data_patch_size[0]*data_patch_size[1],total_windows))
flat_windows = np.transpose(flat_windows)
print('shape of the window data to send to the predicter:',np.shape(flat_windows))

results, max_index = Predict(flat_windows) # get predictions on all the windows