Search code examples
pythonnumpylinear-regression

Why use negative index in Numpy slicing of a 2-D array?


I am trying to understand a Udacity linear regression example which includes this:

data = np.loadtxt('data.csv',delimiter=',') # This is known to be a 2-columns, many rows array
X = data[:,:-1]
y = data[:,-1]

So, if I understand, X is a 1-column array capturing all the columns of data except the last one (so in effect capturing the first column only) and y is a 1-column array capturing only the last column of data.

My question is why not write the code this way:

X = data[:,0]
y = data[:,1]

Would it not be clearer / cleaner?


Solution

  • X is an (n, 1) 2D array because slicing preserves the dimensionality. Alternative phrasings would be

    X = data[:, :1]
    X = data[:, 0, None]
    X = data[:, 0].reshape(-1, 1)
    

    y is an (n,) 1D array.

    These shapes are likely important for the linear algebra used to implement the regression.