Search code examples
pythonarraysnumpyslicesub-array

what's the difference between np.array[:,0] and np.array[:,[0]]?


I have a numpy array cols2:

print(type(cols2))
print(cols2.shape)
<class 'numpy.ndarray'>
(97, 2)

I was trying to get the first column of this 2d numpy array using the first code below, then i got a vector instead of my ideal one column of data. the second code seem to get me the ideal answer, but i am confused what does the second code is doing by adding a bracket outside the zero?

print(type(cols2[:,0]))
print(cols2[:,0].shape)
<class 'numpy.ndarray'>
(97,)

print(type(cols2[:,[0]]))
print(cols2[:,[0]].shape)
<class 'numpy.ndarray'>
(97, 1)

Solution

  • cols2[:, 0] specifies that you want to slice out a 1D vector of length 97 from a 2D array. cols2[:, [0]] specifies that you want to slice out a 2D sub-array of shape (97, 1) from the 2D array. The square brackets [] make all the difference here.

    v = np.arange(6).reshape(-1, 2)
    
    v[:, 0]
    array([0, 2, 4])
    
    v[:, [0]]
    array([[0],
           [2],
           [4]])
    

    The fundamental difference is the extra dimension in the latter command (as you've noted). This is intended behaviour, as implemented in numpy.ndarray.__get/setitem__ and codified in the NumPy documentation.

    You can also specify cols2[:,0:1] to the same effect - a column sub-slice.

    v[:, 0:1]
    array([[0],
           [2],
           [4]])
    

    For more information, look at the notes on Advanced Indexing in the NumPy docs.