Search code examples
pythonnumpyimage-processingmultidimensional-arraysignal-processing

Implementing 2D sliding window over np.array without overlaps (tiling, mean pooling)


I'm trying to implement 2D sliding window of cubic shape (k,k), to so I can iterate over a frame (n,m,3) and calculate the mean over the pixel values in each window. I want each iteration, that the window will present the next slide without any overlapping values; i.e. given this matrix:

[
  [1, 2, 3, 4],
  [5, 6, 7 ,8],
  [9, 10, 11, 12],
  [13, 14, 15, 16]
 ]

and for k = 2 I'll have something as follows:

 [
  [1, 2],
  [5, 6]
 ]

and the second window's value as follows:

 [
  [3, 4],
  [7, 8]
 ]

and so on.

I have tried using numpy.lib.stride_tricks.as_strided. but without any success.

also it is important to use numpy or any other efficient library since implementing this code with a python for-loop is too expensive for that operation.


Solution

  • Ignoring the third dimension, since it doesn't seem to enter the problem, how about:

    # Generate array of size (m*k, n*k)
    # my m, n are your m, n divided by k
    m, n = 2, 4
    k = 3
    x = np.arange(m*n*k*k).reshape((m*k, n*k))
    # array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
    #        [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
    #        [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
    #        [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47],
    #        [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
    #        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71]])
    
    # Calculate means of each block
    y = x.reshape((m, k, n, k))
    z = np.moveaxis(y, -3, -2)  # now shape is (m, n, k, k) 
    np.mean(z, axis=(-2, -1))  # take mean over last two axes
    # or just np.mean(y, axis=(-3, -1)) 
    # array([[13., 16., 19., 22.],
    #        [49., 52., 55., 58.]])
    

    To preserve the third dimension and do the calculations on each color channel separately, just move it out of the way at the beginning (e.g. to axis 0), then move it back at the end.

    If the number of rows and/or columns are not divisible by k, you could pad the array with some sentinel value that doesn't appear in the data (e.g. nan) until the number of rows/columns are divisible by k. Then, take the means in some way that ignores the sentinel values (e.g. nanmean). (Alternatively, split off the remainder rows/columns, handle them separately, and combine the results.)

    There might be something in scipy.ndimage or scikit-image that will do this in one line. I tried zoom, but didn't find a magic combination of settings that gave the desired result.