Search code examples
pythonnumpymultidimensional-arrayindexingsparse-matrix

Conversion of multi-dimension array to 2d and subsequent indexing


I have some code which logically is best setup as heavily nested arrays. The overall structure is of high dimensions and sparse so I've had to convert it to a 2d matrix as required by the sparse implementation so it will fit in memory.

I now find myself mentally switching between the 2 formats which is complicated and confusing. I've written a little function which from the nested inputs will calculate the 2d cell but if i want to do a range query it will get much more complicated.

import numpy as np

dim1 = 1
dim2 = 2
dim3 = 3
dim4 = 4 
dim5 = 5
dim6 = 6

sixD = np.arange(720).reshape(dim1, dim2, dim3, dim4, dim5, dim6)

twoD = sixD.transpose(0,1,2,3,4,5).reshape(dim1,-1)

def sixDto2DCell(a, b, c, d, e, f):
  return [a, (b*dim3*dim4*dim5*dim6) + 
    (c*dim4*dim5*dim6) + 
    (d*dim5*dim6) + 
    (e*dim6) + 
    f]

x, y = sixDto2DCell(0, 1, 2, 3, 4, 5)
assert(sixD[0, 1, 2, 3, 4, 5] == twoD[x, y])

so I'm trying to work out what I'd do for a query like

sixD[0, 1, 0:, 3, 4, 5]

to return the same values in the 2d matrix

Will I need to write a new function or have I missed a built-in numpy way of achieving the same thing ?

Any help would be greatly appreciated :-)


Solution

  • Approach #1

    Here's one way to generically extract data off a 2D sparse matrix or any 2D array for that matter with corresponding n-dim array and its start and end indices along each axis -

    def sparse_ndim_map_indices(ndim_shape, start_index, end_index):       
        """
        Get flattened indices for indexing into a sparse array mapped to
        a corresponding n-dim array.
        """        
    
        # Get shape and cumulative shape info for use to get flattened indices later
        shp = ndim_shape
        cshp = np.r_[np.cumprod(shp[::-1])[::-1][1:],1]
    
        # Create open-ranges
        o_r = np.ix_(*[s*np.arange(i,j) for (s,i,j) in zip(cshp,start_index,end_index)])
    
        id_ar = np.zeros(np.array(end_index) - np.array(start_index), dtype=int)
        for r in o_r:
            id_ar += r
        return id_ar
    

    Using the provided sample for studying a sample case run -

    In [637]: start_index = (0,1,1,1,4,3)
         ...: end_index =   (1,2,3,4,5,6)
         ...: 
         ...: out1 = sixD[0:1, 1:2, 1:3, 1:4, 4:5, 3:6]
    
    In [638]: out1
    Out[638]: 
    array([[[[[[537, 538, 539]],
    
              [[567, 568, 569]],
    
              [[597, 598, 599]]],
    
    
             [[[657, 658, 659]],
    
              [[687, 688, 689]],
    
              [[717, 718, 719]]]]]])
    
    In [641]: idx = sparse_ndim_map_indices(sixD.shape, start_index, end_index)
    
    In [642]: twoD[:,idx.ravel()]
    Out[642]: 
    array([[537, 538, 539, 567, 568, 569, 597, 598, 599, 657, 658, 659, 687,
            688, 689, 717, 718, 719]])
    

    Approach #2

    Here's another upon creating all combinations of indices along each axis and then using np.ravel_multi_index to get the flattend indices -

    import itertools
    
    def sparse_ndim_map_indices_v2(ndim_shape, start_index, end_index):    
        # Create ranges and hence get the flattened indices
        r = [np.arange(i,j) for (i,j) in zip(start_index,end_index)]
        return np.ravel_multi_index(np.array(list(itertools.product(*r))).T, ndim_shape)