Search code examples
pythonnumpymatrixreshapenumba

Python - Reshape matrix by taking n consecutive rows every n rows


There is a bunch of questions regarding reshaping of matrices using NumPy here on stackoverflow. I have found one that is closely related to what I am trying to achieve. However, this answer is not general enough for my application. So here we are.

I have got a matrix with millions of lines (shape m x n) that looks like this:

[[0, 0, 0, 0],
 [1, 1, 1, 1],
 [2, 2, 2, 2],
 [3, 3, 3, 3],
 [4, 4, 4, 4],
 [5, 5, 5, 5],
 [6, 6, 6, 6],
 [7, 7, 7, 7],
 [...]]

From this I would like to go to a shape m/2 x 2n like it can be seen below. For that one has to take n consecutive rows every n rows (in this example n = 2). The blocks of consecutively taken rows are then horizontally stacked to the untouched rows. In this example that would mean:

  1. The first two rows stay like they are.
  2. Take row two and three and horizontally concatenate them to row zero and one.
  3. Take row six and seven and horizontally concatenate them to row four and five. This concatenated block then becomes row two and three.
  4. ...
[[0, 0, 0, 0, 2, 2, 2, 2],
 [1, 1, 1, 1, 3, 3, 3, 3],
 [4, 4, 4, 4, 6, 6, 6, 6],
 [5, 5, 5, 5, 7, 7, 7, 7],
 [...]]

How would I most efficiently (in terms of the least computation time possible) do that using Numpy? And would it make sense to speed the process up using Numba? Or is there not much to speed up?


Solution

  • Assuming your array's length is divisible by 4, here one way you can do it using numpy.hstack after creating the correct indices for selecting the rows for the "left" and "right" parts of the resulting array:

    import numpy 
    # Create the array
    N = 1000*4
    a = np.hstack([np.arange(0, N)[:, None]]*4) #shape (4000, 4)
    a
    array([[   0,    0,    0,    0],
           [   1,    1,    1,    1],
           [   2,    2,    2,    2],
           ...,
           [3997, 3997, 3997, 3997],
           [3998, 3998, 3998, 3998],
           [3999, 3999, 3999, 3999]])
    
    left_idx = np.array([np.array([0,1]) + 4*i for i in range(N//4)]).reshape(-1)
    right_idx = np.array([np.array([2,3]) + 4*i for i in range(N//4)]).reshape(-1)
    
    r = np.hstack([a[left_idx], a[right_idx]]) #shape (2000, 8)
    r
    array([[   0,    0,    0, ...,    2,    2,    2],
           [   1,    1,    1, ...,    3,    3,    3],
           [   4,    4,    4, ...,    6,    6,    6],
           ...,
           [3993, 3993, 3993, ..., 3995, 3995, 3995],
           [3996, 3996, 3996, ..., 3998, 3998, 3998],
           [3997, 3997, 3997, ..., 3999, 3999, 3999]])