python python-3.x numpy opencv python-multithreading

Multi-threaded image processing with openCV in python

I am quite new to python and have problems to parallize a part of my algorithm. Consider an input image thats need to be threshold in a certain way on pixel level. Since the algorithm only considers a specific area to calculate the threshold values, I'd like to run each chunk of the image in a seperate thread/process. And this is where I am stuck. I could not find a way that these threads work on the same image or how I can merge the results into a new image. Since I am usually from the java world I usually fighting the problem that I not want to interfere in other Thread. Therefore I just tried to pass each process the image.

def thresholding(img):
    stepSize = int(img.shape[0] / 10)
    futures = []
    with ProcessPoolExecutor(max_workers=4) as e:
        for y in range(0, img.shape[0], stepSize):
            for x in range(0, img.shape[1], stepSize):
                futures.append(e.submit(thresholdThread, y, x, img))
    concurrent.futures.wait(futures)
    return img


def thresholdThread(y, x, img):
    window_size = int(img.shape[0] / 10)
    window_shape = (window_size, window_size)
    window = img[y:y + window_shape[1], x:x + window_shape[0]]
    upper_bound, lower_bound, avg = getThresholdBounds(window, 0.6)

    for y_2 in range(0, window.shape[0]):
        for x_2 in range(0, window.shape[1]):
            tmp = img[y + y_2, x + x_2]
            img[y + y_2, x + x_2] = tmp if (tmp >= upper_bound or tmp <= lower_bound) else avg
    return str(avg)

As far as I understand python this does not work, because each process gets its own copy of img. But since img is of type ndarray of floats from numpy I have no idea if and how I could use shared object as descibed here.

FYI: I am using python 3.6.9. I do know that 3.7 is released, but installing everthing again so I can use spyder and openCV is not so easy.

Solution

You are not taking advantage of any of Numpy's vectorization techniques which can decrease processing time significantly. I'm assuming this is why you want to multiprocess operations on windows/chunks of the image(s) - I don't know what Docker is so I don't know whether that is a factor in your multiprocess approach.

Here is a vectorized solution with the caveat that it possibly excludes bottom and right edge pixels from the operations. If that is not acceptable no need to read any further.

The size of the right and bottom edge windows in your example are, more likely than not, different than the other windows. Looks like you arbitrarily chose a factor of ten to chunk up your image - if ten was an arbitrary choice, you can optimize the bottom and right edge deltas easily - I'll post that function at the end of the answer.

The image needs to be reshaped into patches to vectorize the operations . I've used an sklearn function sklearn.feature_extraction.image._extract_patches because it is convenient and allows creation of non-overlapping patches (which appears to be what you want). Notice the underscore prefix - this used to be an exposed function, image.extract_patches, but that has been deprecated. The function uses numpy.lib.stride_tricks.as_strided - it might be possible to just reshape the array but I haven't tried that.

Setup

import numpy as np
from sklearn.feature_extraction import image
img = np.arange(4864*3546*3).reshape(4864,3546,3)
# all shape dimensions in the following example derived from img's shape

Define the patch size (see opt_size below) and reshape the image.

hsize, h_remainder, h_windows = opt_size(img.shape[0])
wsize, w_remainder, w_windows = opt_size(img.shape[1])

# rgb - not designed for rgba
if img.ndim == 3:
    patch_shape = (hsize,wsize,img.shape[-1])
else:
    patch_shape = (hsize,wsize)

patches = image._extract_patches(img,patch_shape=patch_shape,
                                 extraction_step=patch_shape)
patches = patches.squeeze()

patches is a view of the original array changes to it will be seen in the original. Its shape is (8, 9, 608, 394, 3) There are 8x9, (608,394,3) windows/patches.

Find the upper and lower bounds of each patch; compare each pixel to the bounds for its patch; extract the indices for each pixel that is between its bounds and needs to be changed.

lower = patches.min((2,3)) * .6
lower = lower[...,None,None,:]
upper = patches.max((2,3)) * .6
upper = upper[...,None,None,:]
indices = np.logical_and(patches > lower, patches < upper).nonzero()

Find the mean of each patch then change the required pixel values,

avg = patches.mean((2,3))    # shape (8,9,3)
patches[indices] = avg[indices[0],indices[1],indices[-1]]

Function that puts it all together

def g(img, opt_shape=False):
    original_shape = img.shape

    # determine patch shape   
    if opt_shape:
        hsize, h_remainder, h_windows = opt_size(img.shape[0])
        wsize, w_remainder, w_windows = opt_size(img.shape[1])
    else:
        patch_size = img.shape[0] // 10
        hsize, wsize = patch_size,patch_size

    # constraint checking here(?) for
    #     number of windows,
    #     orphaned pixels

    if img.ndim == 3:
        patch_shape = (hsize,wsize,img.shape[-1])
    else:
        patch_shape = (hsize,wsize)

    patches = image._extract_patches(img,patch_shape=patch_shape,
                                     extraction_step=patch_shape)
    #squeeze??
    patches = patches.squeeze()

    #assume color (h,w,3)
    lower = patches.min((2,3)) * .6
    lower = lower[...,None,None,:]
    upper = patches.max((2,3)) * .6
    upper = upper[...,None,None,:]
    indices = np.logical_and(patches > lower, patches < upper).nonzero()

    avg = patches.mean((2,3))
##    del lower, upper, mask
    patches[indices] = avg[indices[0],indices[1],indices[-1]]

def opt_size(size):
    '''Maximize number of windows, minimize loss at the edge

    size -> int
       Number of "windows" constrained to 4-10
       Returns (int,int,int)
           size in pixels,
           loss in pixels,
           number of windows
    '''

    size = [(divmod(size,n),n) for n in range(4,11)]
    n_windows = 0
    remainder = 99
    patch_size = 0
    for ((p,r),n) in size:
        if r <= remainder and n > n_windows:
            remainder = r
            n_windows = n
            patch_size = p
    return patch_size, remainder, n_windows

Tested against your naïve process - I hope I executed it correctly. About a 35x improvement on the 4864x3546 color image. There are probably further optimizations maybe some wizards will comment.

Test using your chunk factor of ten:

#yours
def f(img):
    window_size = int(img.shape[0] / 10)
    window_shape = (window_size, window_size)

    for y in range(0, img.shape[0], window_size):
        for x in range(0, img.shape[1], window_size):

            window = img[y:y + window_shape[1], x:x + window_shape[0]]
            upper_bound = window.max((0,1)) * .6
            lower_bound = window.min((0,1)) * .6
            avg = window.mean((0,1))

            for y_2 in range(0, window.shape[0]):
                for x_2 in range(0, window.shape[1]):
                    tmp = img[y + y_2, x + x_2]
                    indices = np.logical_and(tmp < upper_bound,tmp > lower_bound)
                    tmp[indices] = avg[indices]


img0 = np.arange(4864*3546*3).reshape(4864,3546,3)
#get everything the same shape
size = img0.shape[0] // 10
h,w = size*10, size * (img0.shape[1]//size)
img1 = img0[:h,:w].copy()
img2 = img1.copy()

assert np.all(np.logical_and(img1==img2,img2==img0[:h,:w]))
f(img1)    # ~44 seconds
g(img2)    # ~1.2 seconds
assert(np.all(img1==img2))
if not np.all(img2==img0[:h,:w]):
    pass
else:
    raise Exception('did not change')

indices is an index array. It is a tuple of arrays, one for each dimension. indices[0][0],indices[1][0],indices[2][0] would be the index for one element in a 3d array. The complete tuple can be used to index multiple elements of an array.

>>> indices
(array([1, 0, 2]), array([1, 0, 0]), array([1, 1, 1]))
>>> list(zip(*indices))
[(1, 1, 1), (0, 0, 1), (2, 0, 1)]
>>> arr = np.arange(27).reshape(3,3,3)
>>> arr[1,1,1], arr[0,0,1],arr[2,0,2]
(13, 1, 20)
>>> arr[indices]
array([13,  1, 19])

# arr[indices] <-> np.array([arr[1,1,1],arr[0,0,1],arr[2,0,1]])

np.logical_and(patches > lower, patches < upper) returns a boolean array and nonzero() returns the indices of all the elements with a value of True.