I know that Mahout is used for batch processing, but I am interested if I can use its KMeans, and how, for clustering individual points?
Let's say that we have following situation
Can I do this using Mahout, or I have to implement it myself? I thought setting number of iterations to 1, and in that way assign the point, but the thing is, KMeans recomputes cluster centroids and if that new point is an outlier, it makes a new cluster from it. I don't want that, I actually want the distance to closest centroid.
For now, it seems that it is not very appropriate to use KMeans for this, but it should be implemented separately... Is that correct?
Thanks
You don't need to use Mahout for this.
K-means assigns points to the nearest center.
So just get all centers (which should fit easily into RAM), and compute the least-squares difference to each center.
It's just a few CPU cycles, there is absolutely no benefit in trying to do this on Mahout - the overhead will be much too large for just some k distance computations.