Search code examples
hadoopcluster-analysisdata-miningmahoutk-means

Mahout Clustering with one dim K-means


Can I cluster data with one variable instead of many (What I had already test) using mahout K-means Algorithm ? if yes (I hope so :) )could you give me an Example of clustering and thinks


Solution

  • How big is your data? If it is not exabytes, you would be better off without Mahout.

    If it is exabytes, use sampling, and then process it on a single machine.

    See also:

    and many more.

    Mahout is not your general go-to place for data anlysis. It only shines when you have Google scale data. Otherwise, the overhead is too large.