Search code examples
pythonarraysalgorithmnumpyslice

Efficient way to delete columns and rows from a numpy array using slicing and not np.delete


Would it be possible given an array A, bad row indices and bad column indices to use slicing to make a new array that does not have these rows or columns?

This can be done with np.delete as follows:

import numpy as np
A=np.random.rand(20,16)
bad_col=np.arange(0,A.shape[1],4)[1:]
bad_row=np.arange(0,A.shape[0],4)[1:]
Anew=np.delete(np.delete(A,bad_row,0),bad_col,1)
print('old shape ',A.shape)
print('new shape ',Anew.shape)

I also know that you can use slicing to select certain columns and rows from an array. But I'm wondering if it can be used to exclude certain column and rows? and if not what the best way besides np.delete to do that.

EDIT: Based on comments, it might not be possible with slicing in place. How about creating a new array with advanced indexing?

It can be done with the following code but slow, looking for a faster alternative:

good_col = [i for i in range(A.shape[1]) if i==0 or i % 4 != 0]
good_row=[ i for i in range(A.shape[0]) if i==0 or i % 4 != 0]

Anew2=A[good_row,:][:,good_col]
print('new shape ',Anew2.shape)

Thank you


Solution

  • You cannot remove items of an array without either moving all items (which is slow for large arrays) or creating a new one. There is no other solution.

    In Numba or Cython, you can directly create a new array with one operation instead of 2 so it should be about twice faster for large arrays. It should be even faster for small arrays because Numpy functions have a significant overhead for small arrays.

    Numpy views are either contiguous or strided. There is no way to use a variable stride along a given axis. This has been defined that way for sake of performance. Thus, if you want to select only columns and rows with an even ID, you can (because there is a constant stride for each axis that can be set for the resulting view). However, you cannot select all rows/columns avec an ID that is not divisible by 4 for example (because there are no view that can be built with a constant stride).

    Note that if you try to cheat by creating new dimension and then flatten the view, then Numpy will create a copy (because there is no other way).