How to perform K-means using Apache Hadoop?

I'm a beginner to Apache Hadoop and so far I have performed the Word Count problem using mapReduce for learning purposes. My objective is to perform K-means clustering on a set of data say 1.5gig+.

What is the simplest way to perform K-means clustering using Hadoop? Should I modify my map and reduce functions according to K-means requirements or do I require Mahout (I haven't used it before), or can the objective be achieved without it?

Host OS is Win7 and I have setup HortonWorks Sandbox 2.3 on VirtualBox. Any help would be much appreciated as I'm a bit confused as to which path to choose to achieve my objective. Thanking you in anticipation.

Solution

I think easy way to do k means is K-MEANS . Spark run using hadoop hdfs.

Apache Spark

Here is examplea nd details you can find from spark site

public class KMeansExample {
  public static void main(String[] args) {
    SparkConf conf = new SparkConf().setAppName("K-means Example");
    JavaSparkContext sc = new JavaSparkContext(conf);

    // Load and parse data
    String path = "data/mllib/kmeans_data.txt";
    JavaRDD<String> data = sc.textFile(path);
    JavaRDD<Vector> parsedData = data.map(
      new Function<String, Vector>() {
        public Vector call(String s) {
          String[] sarray = s.split(" ");
          double[] values = new double[sarray.length];
          for (int i = 0; i < sarray.length; i++)
            values[i] = Double.parseDouble(sarray[i]);
          return Vectors.dense(values);
        }
      }
    );
    parsedData.cache();

    // Cluster the data into two classes using KMeans
    int numClusters = 2;
    int numIterations = 20;
    KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);

    // Evaluate clustering by computing Within Set Sum of Squared Errors
    double WSSSE = clusters.computeCost(parsedData.rdd());
    System.out.println("Within Set Sum of Squared Errors = " + WSSSE);

    // Save and load model
    clusters.save(sc.sc(), "myModelPath");
    KMeansModel sameModel = KMeansModel.load(sc.sc(), "myModelPath");
  }
}