I want to build a NumPy array of shape (HEIGHT, WIDTH, 3)
where HEIGHT
and WIDTH
correspond to the shape of an image stored inside a standard NumPy array, in which every (i, j)
position has the inmediate neighbors (in a certain direction) of that position. For example, if A = [[0, 1, 2, 3, 4],[5, 6, 7, 8, 9]]
and I want the neighbors of (1, 2)
positioned to the north, I would get as a result [1, 2, 3]
. So, my result matrix should have [1, 2, 3] = A[1:4]
at its respective (1, 2)
.
For now, I have tried a simple approach in which I do not use such a matrix, instead I iterate over all values in my array and slice it accordingly to get the desired neighbors. Nonetheless, if I could compute that matrix beforehand, the algorithm I use afterwards could be vectorized (I don't include this part in the question because it's not relevant to my problem), which is almost always faster at the expense of more memory usage.
scales = 3
padded_img = np.pad(img, scales, mode='constant')
feature_vectors = np.zeros((img.shape[0]*img.shape[1], 4*scales))
z = 0
for i in range(scales, padded_img.shape[0] - scales):
for j in range(scales, padded_img.shape[1] - scales):
for scale in range(1, scales + 1):
N = padded_img[i - scale, j - scale: j + scale + 1]
E = padded_img[i - scale: i + scale + 1, j + scale]
S = padded_img[i + scale, j - scale: j + scale + 1]
W = padded_img[i - scale: i + scale + 1, j - scale]
neighbors = np.vstack((N, E, S, W))
avgs = np.mean(neighbors, axis=1)
feature_vectors[z, 4*(scale-1):4*scale] = avgs.flatten()
z += 1
img
is my original NumPy array; I pad it to avoid problems in the corners. On the other hand, I use scales
because I basically need not only the inmediate neighbors, but those distanced 1 to scales
from a certain position. Since I am also interested in all possible directions, I use N, E, S, W
as my vectors of neighbors inside the loop. Above all, the idea is to reduce this algorithm's time complexity. Any ideas? Thank you.
EDIT: After getting those 4 vectors every iteration, I compute their average, flatten them and append them to a feature vector, whose rows contain this information of all directions on all 4 scales.
I am posting this here so that people who may come across this question can have the full picture. I encountered this problem when I was creating a Python implementation for the extraction of multi-scale features explained in the paper Segmentation of structural features in cheese micrographs using pixel statistics by G. Impoco, L. Tuminello, N. Fucà, M. Caccamo and G. Licitra. You can check it out in ScienceDirect.
I first created a naïve approach consisting of a not-so-elegant triple for-loop that went like this (posted in my original question):
import numpy as np
padded_img = np.pad(img, scales, mode="constant")
for i in range(scales, padded_img.shape[0] - scales):
for j in range(scales, padded_img.shape[1] - scales):
for scale in range(1, scales + 1):
N = padded_img[i - scale, j - scale : j + scale + 1]
E = padded_img[i - scale : i + scale + 1, j + scale]
S = padded_img[i + scale, j - scale : j + scale + 1]
W = padded_img[i - scale : i + scale + 1, j - scale]
This approach obtained the desired neighborhood information by using a variable scale
that would iterate over the predefined window radius. This variable allowed me to access the neighboring pixels of (i, j)
. Nevertheless, as @Daniel F pointed out, this can be convolved and made simpler like so:
import scipy.ndimage
directions = 4
for scale in range(1, scales + 1):
filter_size = 2 * (scale - 1) + 3
orig = np.zeros((filter_size, filter_size), dtype=np.float64)
orig[:, 0] = 1 / filter_size
for c in range(directions):
# c = 0 -> North; c = 1 -> East; c = 2 -> South; c = 3 -> West
correlation_filter = np.rot90(orig, 3 - c, (0, 1))
convolution_filter = np.flip(correlation_filter)
directions_img = scipy.ndimage.convolve(img, convolution_filter, cval=0.0, mode="constant")
So, instead of iterating over every pixel and computing what I wanted that many times, this iterates over the wanted directions
(N
, E
, W
and S
in my naïve approach) and computes everything I need for every single pixel, without the need of actually iterating over every single one of them.
Naturally, the difference in performance is mind-blowing. I tested out both versions with timeit
module (mean ± std. dev. of 7 runs, 1 loop each
) and got the following benchmark with an image of shape (630, 630)
and scales=1
:
Naïve approach: 46.4 ± 1.65 s per loop
Convolution approach: 204 ± 11.7 ms per loop
scales=1
scales=3
You can check out my code for both versions in this GitHub gist. Keep in mind that this code was specifically written for gray images.