Search code examples
pythondataframenumpyindexingnumpy-slicing

What is the Numpy slicing notation in this code?


# split into inputs and outputs

X, y = data[:, :-1], data[:, -1]

print(X.shape, y.shape)

Can someone explain the second line of code with reference to specific documentation? I know its slicing but the I couldn't find any reference for the notation ":-1" anywhere. Please give the specific documentation portion.

Thank you

It results in slicing, most probably using numpy and it is being done on a data of shape (610, 14)


Solution

  • Per the docs:

    Indexing on ndarrays

    ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.

    1D array

    Slicing a 1-dimensional array is much like slicing a list

    import numpy as np
    
    
    np.random.seed(0)
    array_1d = np.random.random((5,))
    
    print(len(array_1d.shape))
    
    1
    

    NOTE: The len of the array shape tells you the number of dimensions.

    We can use standard python list slicing on the 1D array.

    # get the last element
    print(array_1d[-1])
    
    0.4236547993389047
    
    # get everything up to but excluding the last element
    print(array_1d[:-1])
    
    [0.5488135  0.71518937 0.60276338 0.54488318]
    

    2D array

    array_2d = np.random.random((5, 1))
    
    print(len(array_2d.shape))
    
    2
    

    Think of a 2-dimensional array like a data frame. It has rows (the 0th axis) and columns (the 1st axis). numpy grants us the ability to slice these axes independently by separating them with a comma (,).

    # the 0th row and all columns
    # the 0th row and all columns
    print(array_2d[0, :])
    
    [0.79172504]
    
    # the 1st row and everything after + all columns
    print(array_2d[1:, :])
    
    [[0.52889492]
     [0.56804456]
     [0.92559664]
     [0.07103606]]
    
    # the 1st through second to last row + the last column
    print(array_2d[1:-1, -1])
    
    [0.52889492 0.56804456 0.92559664]
    

    Your Example

    # split into inputs and outputs
    
    X, y = data[:, :-1], data[:, -1]
    
    print(X.shape, y.shape)
    

    Note that data.shape is >= 2 (otherwise you'd get an IndexError).

    This means data[:, :-1] is keeping all "rows" and slicing up to, but not including, the last "column". Likewise, data[:, -1] is keeping all "rows" and selecting only the last "column".

    It's important to know that when you slice an ndarray using a colon (:), you will get an array with the same dimensions.

    print(len(array_2d[1:, :-1].shape))  # 2
    

    But if you "select" a specific index (i.e. don't use a colon), you may reduce the dimensions.

    print(len(array_2d[1, :-1].shape))  # 1, because I selected a single index value on the 0th axis
    
    print(len(array_2d[1, -1].shape))  # 0, because I selected a single index value on both the 0th and 1st axes
    

    You can, however, select a list of indices on either axis (assuming they exist).

    print(len(array_2d[[1], [-1]].shape))  # 1
    
    print(len(array_2d[[1, 3], :].shape))  # 2