Search code examples
javawekaregressionarff

How to create an ARFF file from an array in java?


I want to get the coefficients of a weighted linear regression of an x-y pair represented by two arrays in java. I have zeroed in on weka, but it is asking an 'Instances' class object in the 'LinearRegression' class. To create an 'Instances' class file, an ARFF file is needed which contains the data. I have come across solutions that use the FastVector class but that has now been deprecated in the latest weka version. How do I create an ARFF file for the x-y pair and the corresponding weights all represented by arrays in java?

Here's my code based on Baz's answer. It's giving an exception on the last line "lr.buildClassifier(newDataset)" - Thread [main] (Suspended (exception UnassignedClassException))
Capabilities.testWithFail(Instances) line: 1302 . Here's the code -

public static void test() throws Exception
{
    double[][] data = {{4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0}, {19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0, 243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0, 354.0, 356.0, 357.0, 358.0}};

    int numInstances = data[0].length;

    ArrayList<Attribute> atts = new ArrayList<Attribute>();
    List<Instance> instances = new ArrayList<Instance>();
    for(int dim = 0; dim < 2; dim++)
    {
        Attribute current = new Attribute("Attribute" + dim, dim);

        if(dim == 0)
        {
            for(int obj = 0; obj < numInstances; obj++)
            {
                instances.add(new SparseInstance(numInstances));
            }
        }

        for(int obj = 0; obj < numInstances; obj++)
        {
            instances.get(obj).setValue(current, data[dim][obj]);
            //instances.get(obj).setWeight(weights[obj]);
        }
        atts.add(current);
    }

    Instances newDataset = new Instances("Dataset", atts, instances.size());

    for(Instance inst : instances)
        newDataset.add(inst);

    LinearRegression lr = new LinearRegression();

    lr.buildClassifier(newDataset);             
}

Solution

  • I think this might help you:

    FastVector atts = new FastVector();
    List<Instance> instances = new ArrayList<Instance>();
    for(int dim = 0; dim < numDimensions; dim++)
    {
        // Create new attribute / dimension
        Attribute current = new Attribute("Attribute" + dim, dim);
        // Create an instance for each data object
        if(dim == 0)
        {
            for(int obj = 0; obj < numInstances; obj++)
            {
                instances.add(new SparseInstance(numDimensions));
            }
        }
    
        // Fill the value of dimension "dim" into each object
        for(int obj = 0; obj < numInstances; obj++)
        {
            instances.get(obj).setValue(current, data[dim][obj]);
        }
    
        // Add attribute to total attributes
        atts.addElement(current);
    }
    
    // Create new dataset
    Instances newDataset = new Instances("Dataset", atts, instances.size());
    
    // Fill in data objects
    for(Instance inst : instances)
        newDataset.add(inst);
    

    Afterwards Instances is you dataset.

    Note: The current version (3.6.8) of Weka did not complain, even though I used FastVector.

    However, for the Developer version (3.7.7), use this:

    ArrayList<Attribute> atts = new ArrayList<Attribute>();
    List<Instance> instances = new ArrayList<Instance>();
    for(int dim = 0; dim < numDimensions; dim++)
    {
        Attribute current = new Attribute("Attribute" + dim, dim);
        if(dim == 0)
        {
            for(int obj = 0; obj < numInstances; obj++)
            {
                instances.add(new SparseInstance(numDimensions));
            }
        }
    
        for(int obj = 0; obj < numInstances; obj++)
        {
            instances.get(obj).setValue(current, data[dim][obj]);
        }
    
        atts.add(current);
    }
    
    Instances newDataset = new Instances("Dataset", atts, instances.size());
    
    for(Instance inst : instances)
        newDataset.add(inst);