Search code examples
pythonnumpyplotmultidimensional-arrayline

Generate multi-dimensional line


I am generating a multi-dimensional line. Shouldn't the projection of the line over each dimension be linear? The plot aren't.

from matplotlib import pyplot as plt
import numpy as np

n = 100  # samples
m = 2  # dimensions

X = np.random.randint(0, 100, size=(n, m))
b = np.random.randint(1, 3, m).reshape([m, 1])

y = np.dot(X, b)

for i in range(m):

  plt.scatter(X[:,i], y)
  plt.show()

Feature 1 Feature 2


Solution

  • This looks like a possible misunderstanding of the dot product. Consider a scaled-down example with n=3:

     X
    Out[572]: 
    array([[86, 85],
           [60, 37],
           [36, 57]])
    
    In [573]: y
    Out[573]: 
    array([[342],
           [194],
           [186]])
    
    In [574]: 86+2*85
    Out[586]: 256
    
    In [587]: b
    Out[587]: 
    array([[2],
           [2]])
    

    Notice that the first value of X is 2*86 + 2*85, as expected by the definition of the dot product. So the ratio of y to X[0][0] here is about 3.9. For the 2nd value of X, the ratio of y to X[1][0] is about 3.2. Clearly not a constant ratio, so the fist components of the X vectors don't have a linear relationship with y, as you saw in your plots. Why is this?

    Consider some other case where the first component of x[0]is 86; the second component could be anything in the given range (they're generated randomly). So why would we expect any particular ratio between the first component of X[0] and y?

    Imagine the case where the first component of X[0] was 0 and d was [2, 2]. y[0] is not guaranteed to be 0; y[0] will be 2 times the second component of X[0]. The relationship exists between the vectors in X to the scalers in y; not between the components of the vectors in X and the scalars in y.

    What you may want instead is to have X be scalar valued (X=np.random.randint(1,100, size=n)and y be a vector. Then generate y as X*b. Now plotting X vs each dimension in y would be a line.