Search code examples
pythonnumpysparse-matrixnumpy-ndarray

Downsampling by adding n^dim elements together in numpy/sparse


for a personal project I am trying to downsample a 3d grid, represented as sparse.COO array, with image like data (Index encodes spatial information, value encodes mass).

As far as I know, sparse covers the most important parts of the numpy API, so I will use numpy as an example from here on. Unfortunately, some Frameworks don't handle arrays with many non-zero entries very well.

I tried to write a function, that sums a block of 2^3=8 elements together, based on this question NumPy: sum every n columns of matrix, but my version messes up the indeces:

sparray = np.arange(10*10*10).reshape(10,10,10)

#number of entries to sum in every direction, 2 for only neighbours
num_entries = 2

#the last 3 axis of the reshaped array
axis = tuple(range(-sparray.ndim,0))

downsampled_array = sparray.reshape(
    tuple(
        [int(s/num_entries) for s in sparray.shape]+[num_entries]*sparray.ndim
    )
).sum(axis=axis)

A 2d example Example:

sparray = np.arange(4*4).reshape(4,4)
>>>array(
  [[ 0,  1,  2,  3],
   [ 4,  5,  6,  7],
   [ 8,  9, 10, 11],
   [12, 13, 14, 15]])

axis = tuple(range(-sparray.ndim,0))
>>>(-2, -1)

#This is what the code does:
>>>array(
  [[ 6, 22],
   [38, 54]])

#should do:
array(
[[10, 19],
 [42, 50]]
)

thanks in advance, it is probably some really stupid mistake.


Solution

  • Solved. One needs to interleave the extra dimensions:

    def downsample(sparray: sparse.COO, block_size: int):
        if any(x%block_size != 0 for x in sparray.shape):
            return IndexError('One of the Array dimensions is not divisible by the block_size')
    
        axis = tuple(range(1, 2 * sparray.ndim + 1, 2))
        shape = tuple(index for s in sparray.shape for index in (int(s/block_size), block_size))
        return sparray.reshape(shape).sum(axis=axis)