Search code examples
cluster-analysisk-meansunsupervised-learningmini-batch

Difference betweeen Mini Batch K-Means and Sequential/online KMeans


I am trying out examples of K-Means and its variants using scikit-learn library sklearn.cluster. What is the difference between minibatch K-Means clustering and online/sequential K-Means clustering ?

I could not find the implementation of online KMeans in scikit library. If batch size is 1 then minibatch K-Means will act as online K-Means ?


Solution

  • Mini-batch k-means does not converge to a local optimum.x

    Essentially it uses a subsample of the data to do one step of k-means repeatedly. But because these samples may have different optima, it will not find the best, but move around inbetween of solutions to different parts. You stop after a fixed number of iterations - otherwise it would run forever. If you have well-behaved huge data, this may not make a big difference. if you have a difficult data set and not so much data, a fast (not Lloyd) KMeans will find a better solution, and also only take a few iterations. I doubt that many people have such large data sets where minibatch is a good idea.