I trained a Kmeans model:
kmeans = KMeans(k=20, seed=1)
df.show()
kmeans_model = kmeans.fit(df)
I just want to simply count how many elements in each cluster, but I can't find a simple way to achieve it.
Checked the pyspark document. Here is the answer:
summary = kmeans_model.summary
print(summary.clusterSizes)
Reference:
http://spark.apache.org/docs/2.2.0/api/python/pyspark.ml.html#pyspark.ml.clustering.KMeans