machine-learning classification cluster-analysis k-means

k-means clustered data: how to label newly incoming data

I have a data set with labels that were produced by a k-means clustering algorithm. Now there is some data (with the same data structure) from another source and I wonder what is the most sensible way to label this new, yet unseen data? I was thinking about either

calculating the distance to the prior k-means centroids and label the data to the the nearest centroids accordingly
run a new algorithm (e.g. SVM) on the new data using the old data as the training set

Unfortunately, I couldn't find anything about this particular problem. There are only a few questions about the general use of k-means as a classification model:

Can k-means clustering do classification?
How to segment new data with existing K-means model?

Thanks in advance.

Uli

Solution

You dont need SVM thing.First way is more convenient.If you are using sklearn https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html there is an example here.predict function will do your job.

Is a neural network a lazy or eager learning method?
What is sharding in machine learning and how to do sharding in Tensorflow?
Dataset for bank transaction
How should a training dataset be distributed?
How does Hydra `_partial_` interact with seeding
How can I tell when the model is overfitting?
StratifiedKFold vs KFold in scikit-learn
Training a Keras model to identify leap years
Ideas for Extracting Blade Tip Coordinates from masked Wind Turbine Image
Macro VS Micro VS Weighted VS Samples F1 Score
Doing PyWavelets calculation on GPU
Training loss increases instead of decrease with epochs
cannot access free variable 'fig' where it is not associated with a value in enclosing scope
How to save a Dataset in multiple shards using `tf.data.Dataset.save`
why explain logit as 'unscaled log probabililty' in sotfmax_cross_entropy_with_logits?
What is the loss function used in Trainer from the Transformers library of Hugging Face?
Sampling from image data
Using features extracted using a pretrained CNN as new features for an CNN/NN
InvalidArgumentError: No DNN in stream executor while training a TensorFlow RetinaNet model on Google Colab
ALS (Alternating Least Square) algorithm in multiple rankings for a user
How does one set the pad token correctly (not to eos) during fine-tuning to avoid model not predicting EOS?
How to create image of confusion matrix in Python
Cross-validation with nb method
GPU utilization almost always 0 during training Hugging Face Transformer
The “Forward/Backward Passage Size” is too large for the pytorch model (Yolov3)
How many images(minimum) should be there in each classes for training YOLO?
Why do neural networks work so well?
Will larger batch size make computation time less in machine learning?
Why KL divergence is negative in Pytorch?
Creating a voice identification system using machine learning