I am trying to understand a Udacity linear regression example which includes this:
data = np.loadtxt('data.csv',delimiter=',') # This is known to be a 2-columns, many rows array
X = data[:,:-1]
y = data[:,-1]
So, if I understand, X
is a 1-column array capturing all the columns of data
except the last one (so in effect capturing the first column only) and y
is a 1-column array capturing only the last column of data
.
My question is why not write the code this way:
X = data[:,0]
y = data[:,1]
Would it not be clearer / cleaner?
X
is an (n, 1)
2D array because slicing preserves the dimensionality. Alternative phrasings would be
X = data[:, :1]
X = data[:, 0, None]
X = data[:, 0].reshape(-1, 1)
y
is an (n,)
1D array.
These shapes are likely important for the linear algebra used to implement the regression.