Search code examples
rdata-miningk-means

Kmeans function - Amap package - what nstart stands for


I don't understand what the nstart changes in the algorithm.

If centers = 8, that means the function will cluster 8 groups. But, what nstart variates?

This is the explanation on the documentation:

centers:    
Either the number of clusters or a set of initial cluster centers. If the first, a random set of rows in x are chosen as the initial centers.

nstart:
If centers is a number, how many random sets should be chosen?

Solution

  • Unfortunately, the ?kmeans doesn't exactly explain this (in both stats and the amap packages). But, one can get an idea by looking at the kmeans code.

    If one uses more than one random starts (nstart greater than 1) for the kmeans, then the algorithm returns the partition that corresponds to the smallest total within-cluster sum of squares.

    (The output contain the total within-cluster sum of squares value as tot.withinss).