Search code examples
javaapache-sparkapache-spark-ml

How to setup epsilon in K-Means in new Spark ml library


In spark.mllib library, KMeans has function to set epsilon parameter when building Kmeans instance.

But I did not see any function in Kmeans new Spark.ml library to setup this parameter. The reason I am asking is because the number of cluster the new KMeans generate is less than what I specified in setK() method, so I want to increase the number of clusters generated by decreasing epsilon a bit.

Does anyone know how to setup epsilon in new Spark.ml Kmeans class?

org.apache.spark.ml.clustering.KMeans

Thanks.


Solution

  • Epsilon in the spark.ml library has been renamed to tol (short for tolerance)

    Example:

    KMeans kmeans = new KMeans().setK(2).setSeed(1L).setTol(0.0001)
    KMeansModel model = kmeans.fit(dataset);