I am generating a multi-dimensional line. Shouldn't the projection of the line over each dimension be linear? The plot aren't.
from matplotlib import pyplot as plt
import numpy as np
n = 100 # samples
m = 2 # dimensions
X = np.random.randint(0, 100, size=(n, m))
b = np.random.randint(1, 3, m).reshape([m, 1])
y = np.dot(X, b)
for i in range(m):
plt.scatter(X[:,i], y)
plt.show()
This looks like a possible misunderstanding of the dot product. Consider a scaled-down example with n=3:
X
Out[572]:
array([[86, 85],
[60, 37],
[36, 57]])
In [573]: y
Out[573]:
array([[342],
[194],
[186]])
In [574]: 86+2*85
Out[586]: 256
In [587]: b
Out[587]:
array([[2],
[2]])
Notice that the first value of X is 2*86 + 2*85
, as expected by the definition of the dot product. So the ratio of y to X[0][0]
here is about 3.9
. For the 2nd value of X, the ratio of y to X[1][0] is about 3.2
. Clearly not a constant ratio, so the fist components of the X vectors don't have a linear relationship with y, as you saw in your plots. Why is this?
Consider some other case where the first component of x[0]is 86; the second component could be anything in the given range (they're generated randomly). So why would we expect any particular ratio between the first component of X[0] and y?
Imagine the case where the first component of X[0] was 0 and d was [2, 2]. y[0] is not guaranteed to be 0; y[0] will be 2 times the second component of X[0]. The relationship exists between the vectors in X to the scalers in y; not between the components of the vectors in X and the scalars in y.
What you may want instead is to have X be scalar valued (X=np.random.randint(1,100, size=n
)and y be a vector. Then generate y as X*b. Now plotting X vs each dimension in y would be a line.