Search code examples
javawekanearest-neighbor

How to attach a Weka instance to an Object?


I'm able to create Instances from a list of objects like below using the Weka library https://weka.sourceforge.io/doc.dev/overview-summary.html

    public static Instances createWekaInstances(List<Ticket> tickets, String name) {
    // Create numeric attributes "x" and "y" and "z"
    Attribute x = new Attribute("x"); //sqrt of row pos
    Attribute y = new Attribute("y"); // section cv
    // Create arrayList of the above attributes
    ArrayList<Attribute> attributes = new ArrayList<Attribute>();
    attributes.add(x);
    attributes.add(y);
    // Create the empty datasets "ticketInstances" with above attributes
    Instances ticketInstances = new Instances(name, attributes, 0);

    ticketInstances.setClassIndex(ticketInstances.numAttributes() - 1);

    for (Ticket ticket : tickets) {
        // Create empty instance with attribute values
        Instance inst = new DenseInstance(ticketInstances.numAttributes());
        // get the Ticket
        Ticket t = ticket;
        // Set instance's values for the attributes "x", "y" and so on
        inst.setValue(x, Math.sqrt(t.getRowPosition()));
        inst.setValue(y, t.getSectionCVS());
        // Set instance's dataset to be the dataset "ticketInstances"
        inst.setDataset(ticketInstances);
        // Add the Instance to Instance
        ticketInstances.add(inst);
    }
    return ticketInstances;
}

I'm able to do a nearest neighbor search of whatever instance I want to see it's K nearest neighbors using https://weka.sourceforge.io/doc.dev/weka/core/neighboursearch/NearestNeighbourSearch.html.

Instances neighbors = tree.kNearestNeighbours(ticketInstances.get(indexToSearch), 2);

However it returns a list of 2 instances where an instance looks like -> {0 2.44949,1 0.4} so there is no way for me to associate it to my object. So is there a "Weka" way of attaching an ID or something so I'd be able to know which Object is nearest to the target object in this list of instances?

UPDATE

Okay doing this seems to work for my use case

 BallTree bTree = new BallTree();
    try{
        bTree.setInstances(dataset);
        EuclideanDistance euclideanDistance = new EuclideanDistance();
        euclideanDistance.setDontNormalize(true);
        euclideanDistance.setAttributeIndices("2-last");
        euclideanDistance.setInstances(dataset);
        bTree.setDistanceFunction(euclideanDistance);

    } catch(Exception e){
        e.printStackTrace();
    }

Solution

  • Weka has not concept of unique IDs for weka.core.Instance objects, instead you need to create an additional attribute that will allow you to identify your rows (e.g., the ticket ID or a numeric attribute with unique values).

    You can use the AddID filter to add a numeric attribute to your dataset that will contain such an ID, as mentioned in the Weka wiki article on Instance ID.

    From your code it seems that you are just using the nearest neighbor search without any classifier or cluster involved (for these, you would use the FilteredClassifier/FilteredClusterer approach to remove the ID attribute from the data that is used for building the model), therefore you need to specify in the DistanceFunction which attributes to use for the distance calculation. This is done by supplying an attribute range to the setAttributeIndices(String) method. If your ID attribute is the first one, then you would use 2-last.