I need to cluster groups of points with the same linear relationship, as per the code and figure below.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 30, 100)
y1 = 3*x -50 + 20*np.random.random(size=len(x))
y2 = 3*x -20 + 20*np.random.random(size=len(x))
y3 = 3*x +10 + 20*np.random.random(size=len(x))
plt.plot(x, y1, 'o')
plt.plot(x, y2, 'o')
plt.plot(x, y3, 'o')
plt.xlim([-50,125])
plt.ylim([-50,125])
Obviously, I wouldn't have the points that way; I would just have the following x and y.
x_final = np.concatenate((x,x,x))
y_final = np.concatenate((y1,y2,y3))
plt.plot(x_final, y_final, 'o')
plt.xlim([-50,125])
plt.ylim([-50,125])
Note the following: the points respect linear relationships with high slope, they present a slight separation from each other, and they all have the same slope, just with different intercepts.
How would you suggest I cluster these points? I thought about using PCA and clustering the main components with k-means, but I don't know if there would be a more efficient way. In my real case I have more than three clusters and they have different distances from each other, even though they all have the same slope.
Take a look at all clustering algorithms scikit-learn offers :
Spectral clustering and Gaussian mixture should work for your use case.