Search code examples
javamachine-learningcluster-analysismoa

Java MOA Clustream WithKmeans: null center and (radius, weight = 0) for BOTH micro and macro clusters


First of all: I'm using moa-release-2019.05.0-bin/moa-release-2019.05.0/lib/moa.jar in my java project.

Now, let's go to the point: I am trying to use moa.clusterers.clustream.WithKmeans stream clustering algorithm and I have no idea why this is happening ...

I am new into using moa and I am having a hard time trying to decode how the clustering algorithms have to be used. The documentation lacks of sample code for common usages, and the implementation is not that well explained ... have not found any tutorial either.

  • My code:
import com.yahoo.labs.samoa.instances.DenseInstance;
import moa.cluster.Clustering;
import moa.clusterers.clustream.WithKmeans;

public class TestingClustream {
    static DenseInstance randomInstance(int size) {
        DenseInstance instance = new DenseInstance(size);
        for (int idx = 0; idx < size; idx++) {
            instance.setValue(idx, Math.random());
        }
        return instance;
    }

    public static void main(String[] args) {
        WithKmeans wkm = new WithKmeans();
        wkm.kOption.setValue(5);
        wkm.maxNumKernelsOption.setValue(300);
        wkm.resetLearningImpl();
        for (int i = 0; i < 10000; i++) {
            wkm.trainOnInstanceImpl(randomInstance(2));
        }
        Clustering clusteringResult = wkm.getClusteringResult();
        Clustering microClusteringResult = wkm.getMicroClusteringResult();
    }
}

  • Info from the debugger:

enter image description here

enter image description here

I have read the source code many times, and it seems to me that I am using the correct functions, in the correct order ... I do not know what I am missing ... any feedback is welcomed!


Solution

  • Make sure you have fed the algorithm enough data, it will process the data in batches.

    The fields are unused, likely coming from some parent class with a different purpose.

    Use the getter methods such as getCenter() that will compute the current center from the running sum.