I have the following input integer vectors (example):
4 138 233 461 610 621 669 742 814 827
89 138 334 656 697 810
138
138 196 738
659 738
4 461
138 337 756 810
8 138 196 337 468 663 664 756 809 810
They all contain integer values [1-850] and are all stored in a csv file.
I want to divide them into multiple clusters based on similarities in the vectors, but I'm confused about how exactly to implement a k-means algorithm for my input data in java. Anyone willing to help out with tips or code?
Thanks in advance.
Pseudo-code for k-means clustering
assuming you have a metric (let's call this M) which can compare input objects (in your case vectors) and output a measure of similarity.
and a function (let's call this A) which is capable of calculating the average of a collection of input objects
Also check out https://en.wikipedia.org/wiki/K-means_clustering