Search code examples
javawekadistancek-means

Weka K-means distance


I use weka library to use SimpleKMeans function.

My arff file is:

@relation digits

@attribute number numeric

@data
3.708699941635132
3.608700037002563
3.508699893951416
3.808700084686279
3.708699941635132
3.708699941635132
3.708699941635132
3.708699941635132
3.708699941635132
3.408699989318847
3.708699941635132

It's centroids and I also have distance matrix that contains all centroid distances (there are special calculated distances, not simple Euclid distance). How to transmit distance matrix in code? Now I training on this code:

package kmeanstest;

import java.io.BufferedReader;
import java.io.FileReader;
import weka.clusterers.SimpleKMeans;
import weka.core.Instances;

public class Kmeanstest {
    public Kmeanstest() throws Exception {
        BufferedReader breader = new BufferedReader(new FileReader("data.arff"));
        Instances Train = new Instances(breader);
        SimpleKMeans kMeans = new SimpleKMeans();
        kMeans.setSeed(10);
        kMeans.setPreserveInstancesOrder(true);
        kMeans.setNumClusters(3);
        kMeans.buildClusterer(Train);
        int[] assignments = kMeans.getAssignments();
        int i = 0;
        for (int clusterNum : assignments) {
            System.out.println("Instance " + i + " -> Cluster " + clusterNum);
            i++;
        }
        breader.close();
    }
    public static void main(String[] args) throws Exception {
        Kmeanstest kmeanstest = new Kmeanstest();
    }
}

Solution

  • In my project, I took similar results comparing these distance functions.(in my case I have 40.000 instances and 10 feature).

    However, if you work with more than 2 features, it will be better to create your own distance function (for example, Hamilton distance. I strongly believe that it will have much better results).

    ManhattanDistance manhattan = new ManhattanDistance();
        try {
            kmeans.setDistanceFunction(manhattan);
        } catch (Exception e2) {
            // TODO Auto-generated catch block
            e2.printStackTrace();
        }