Search code examples
machine-learningclassificationcluster-analysisk-means

k-means clustered data: how to label newly incoming data


I have a data set with labels that were produced by a k-means clustering algorithm. Now there is some data (with the same data structure) from another source and I wonder what is the most sensible way to label this new, yet unseen data? I was thinking about either

  • calculating the distance to the prior k-means centroids and label the data to the the nearest centroids accordingly
  • run a new algorithm (e.g. SVM) on the new data using the old data as the training set

Unfortunately, I couldn't find anything about this particular problem. There are only a few questions about the general use of k-means as a classification model:

Thanks in advance.

Uli


Solution

  • You dont need SVM thing.First way is more convenient.If you are using sklearn https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html there is an example here.predict function will do your job.