Search code examples
scalacluster-analysissmile

SMILE xmeans gave wrong clustering


I am testing xmeans clustering with this 2 dimensional array and smile-core 2.4.0.

import smile.clustering.xmeans

val arrData = Array(Array(0.0,0,0),
  Array(0.0,0,0),
  Array(0,0.0,0),
  Array(0,0,0.0),
  Array(0,0,0.0),
  Array(100,100.0,100),
  Array(100,100,100.0),
  Array(100.0,100,100),
  Array(100,100.0,100),
  Array(100,100,100.0),
  Array(1000,1000.0,1000),
  Array(1000,1000,1000.0),
  Array(1000,1000.0,1000),
  Array(1000.0,1000,1000),
  Array(1000,1000.0,1000),
  Array(1000,1000,1000.0))

val fitX = xmeans(arrData, 10)

println("k: " + fitX.k)
println("size: " + fitX.centroids.size)
println("centroids: " + fitX.centroids(0)(0)+"-"+fitX.centroids(0)(1)+"-"+fitX.centroids(0)(2))
println("distortion: " + fitX.distortion)
for (a<-0 to fitX.y.length)  println("y: "+a+" "+ fitX.y(a))

I dont understand why it gave the folloing output as it is very clear that the elements are 0,100,1000. There should not be just one cluster and the centroid is just averages of the 3 features. Did I do anything wrong?

k: 1
size: 1
centroids: 406.25-406.25-406.25
distortion: 1.0228125E7
y: 0 0
y: 1 0
y: 2 0
y: 3 0
....
y:15 0

Solution

  • Just tried another array much longer, length=233,

    Array(Array(1.2,2.2,3.2)
    ....
    Array(33.4,43.4,53.4)
    ...
    Array(121.1,171.1,221.1))
    

    It gave two centroids. So it seemed xmeans has a requirement on min number of data rows.