Search code examples
javaartificial-intelligenceweka

How to train an Weka AI on a dataset through Java


I am currently coding an AI with the Weka API for java. I am using the MNIST handwriten digits dataset to train my AI on. The AI will train on images of handwritten digits then give you an output of if the digit is a 0, 1, 2, etc. Each "Image" is a 28x28 array with each position being a range of 0-255 to indicate a grayscale colour. I am using some code I wrote to convert the array into the ARFF file format:

public void createArffFromDataset(String pathName, MnistDataset dataset) throws IOException {
        File file = new File(pathName);
        file.delete();
        file.createNewFile();
        FileWriter writer = new FileWriter(pathName);
        writer.append("@RELATION MnistDataset\n");

        for (int r = 1; r <= 28; r++) {
            for (int c = 1; c <= 28; c++) {
                writer.append("\n@ATTRIBUTE r").append(String.valueOf(r)).append("c").append(String.valueOf(c)).append(" NUMERIC");
            }
        }
        writer.append("\n@ATTRIBUTE class {0,1,2,3,4,5,6,7,8,9}\n\n@DATA");
        for (MnistMatrix i : dataset.dataset) {
            writer.append("\n");
            for (int r = 0; r < 28; r++) {
                for (int c = 0; c < 28; c++) {
                    writer.append(String.valueOf(i.matrix[r][c])).append(",");
                }
            }
            writer.append(String.valueOf(i.label));
        }
        writer.close();
    }

This loads the dataset into a Arff file that looks like this:

@RELATION MnistDataset

@ATTRIBUTE r1c1 NUMERIC
@ATTRIBUTE r1c2 NUMERIC
@ATTRIBUTE r1c3 NUMERIC
...
@ATTRIBUTE r24c22 NUMERIC
@ATTRIBUTE r24c23 NUMERIC
@ATTRIBUTE r24c24 NUMERIC

@ATTRIBUTE class {0,1,2,3,4,5,6,7,8,9}

@DATA
(Data goes here)

However after I have converted the dataset to a Arff format I cannot find how to train the AI on the dataset. I have looked for documentation however I cannot find one that explains what command to use / how to use the command. A link to some documentation will be appreciated however I am quite new to machine learning so I will not be able to understand some sources.


Solution

  • How to use the Weka API is documented in the Weka manual (PDF) that comes with your Weka installation.

    Alternatively, have a look at the Weka wiki article Use Weka in your Java code.

    Finally, instead of creating an ARFF file through outputting text, you should use the Weka API directly to generate a weka.core.Instances object. That way, you don't even need to generate a file, as you can train your classifier with the Instances object. See the following Weka wiki article on Creating an ARFF file on how to create the data structures in memory.