I am testing xmeans clustering with this 2 dimensional array and smile-core 2.4.0.
import smile.clustering.xmeans
val arrData = Array(Array(0.0,0,0),
Array(0.0,0,0),
Array(0,0.0,0),
Array(0,0,0.0),
Array(0,0,0.0),
Array(100,100.0,100),
Array(100,100,100.0),
Array(100.0,100,100),
Array(100,100.0,100),
Array(100,100,100.0),
Array(1000,1000.0,1000),
Array(1000,1000,1000.0),
Array(1000,1000.0,1000),
Array(1000.0,1000,1000),
Array(1000,1000.0,1000),
Array(1000,1000,1000.0))
val fitX = xmeans(arrData, 10)
println("k: " + fitX.k)
println("size: " + fitX.centroids.size)
println("centroids: " + fitX.centroids(0)(0)+"-"+fitX.centroids(0)(1)+"-"+fitX.centroids(0)(2))
println("distortion: " + fitX.distortion)
for (a<-0 to fitX.y.length) println("y: "+a+" "+ fitX.y(a))
I dont understand why it gave the folloing output as it is very clear that the elements are 0,100,1000. There should not be just one cluster and the centroid is just averages of the 3 features. Did I do anything wrong?
k: 1
size: 1
centroids: 406.25-406.25-406.25
distortion: 1.0228125E7
y: 0 0
y: 1 0
y: 2 0
y: 3 0
....
y:15 0
Just tried another array much longer, length=233,
Array(Array(1.2,2.2,3.2)
....
Array(33.4,43.4,53.4)
...
Array(121.1,171.1,221.1))
It gave two centroids. So it seemed xmeans has a requirement on min number of data rows.