Search code examples
pythonarraysaverage

Averaging arrays over a grid in Python


I would like to average a d-dimensional array M of size NxNx...xN (d times) over a regular grid of step k, i.e. building a new array N of size NxNx...xN (d times) where in each hypercube H of the regular grid of step k, the value of the array are all the same and equal to the average of H in the original array.

More clearly on example, for d=2, N=4 and k=2:

M=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]

and

N=[[1.5,1.5,3.5,3.5],[1.5,1.5,3.5,3.5],[1.5,1.5,3.5,3.5],[1.5,1.5,3.5,3.5]]

This related to an existing question : Grouping 2D numpy array in average Yet, this only works for 2D array and not exactly outputs the array in the shape I want.

I have tried to implement it my self in the case of d=2, yet doing similarly for higher dimension would imply to repeat d-times for loops, which seems prohibitive (I don't see how to do it otherwise). I am sure there is a more compact and efficient way to do so. My try for d=2 :

def local_average_img(M,k):
  l=len(M[0])
  N = np.zeros((l,l))
  for x in range(l):
    for y in range(l):
      print(np.mean(M[(x//k)*k:(x//k)*k+k, (y//k)*k:(y//k)*k+k]))
      N[x][y]=np.mean(M[(x//k)*k:(x//k)*k+k, (y//k)*k:(y//k)*k+k])
  return(N)

M = np.array([[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]])
k = 2
result = local_average_img(M, k)
print(result)

Solution

  • The solution below can help you. As you can see, you are not getting rid of all the looping (I would say that you cannot, but a man can dream). The pros are that the "heavy looping" is done to compute indexes, and form slices. Then the actual averaging is done in numpy which is faster than nesting for loops. Moreover, you do not have to adapt the code (e.g. adding/removing loops) when d changes.

    import itertools
    import numpy as np
    
    d = 2   # number of dimensions
    n = 4   # size of each dimension
    k = 2   # side of the hypercube to average in
    
    # Define M input hypercube and initialize "empty" H hypercube
    M = (np.random.rand(*[n for i in range(d)]) * 10).astype(int)
    H = np.zeros(M.shape, dtype=np.float32)
    
    # Compute the index of the first element in each sub-hypercube.
    # In the 2D example, it is the top-left corner of the "square".
    indexes = [i for i in range(n)
           if i % d == 0]
    indexes = itertools.product(indexes, repeat = d)
    
    for idx in indexes:
        # Also compute the index of the last element of the sub-hypercube.
        # In the 2D example, it is the bottom-right corner of the "square".
        idx_i = np.array(idx)
        idx_f = idx_i + k
        # Define a tuple of slices objects to index the sub-hypercube.
        nd_slice = []
        for dim in range(d):
            s = slice(idx_i[dim], idx_f[dim], 1)
            nd_slice.append(s)
        nd_slice = tuple(nd_slice)  
        # Compute the average in the sub-hypercube and substitute it to the current values.
        H[nd_slice] = np.mean(M[nd_slice])
    
    print(M)
    print(H) 
    

    Disclaimer. I am confident that the for loops can be made more efficient. Indeed, indexes are cast to np.array and then to slices, which are gathered in a list and finally converted to a tuple. To me, it looks too messy to be the optimal solution, but I am not able to make it better...

    Hope this helps!