Search code examples
matlabmachine-learningcluster-analysisk-means

deterministic function in Matlab for clustering


I have been using Matlab built-in kmeans function to do clustering. Due to randomness used in the algorithm, the results are different if I set seeds differently. This is a little annoying. Is there a way to reduce the discrepancy of the clustering results? Alternatively, is there a deterministic function in Matlab for clustering?


Solution

  • I came up with some methods to reduce the discrepancy of the clustering results.

    1. Put 'OnlinePhase','on' in the arguments in the kmeans. This will lead to a local min which is often the global min.
    2. Put 'Replicates', 5 in the arguments. Here 5 can be replaced with an even larger number. It asks Matlab to do kmeans 5 times and choose the best result.
    3. Put 'MaxIter', 1000 in the arguments. This will increase the max number of iterations from the default 100 to 1000, which could, but not likely, improve the accuracy.

    As long as we aim for the best outcome from kmeans, we are more likely to get consistent results.