Last semester I had a project where when given a set of data on cars I had to build a model and use that model to make predictions from user inputted data (It involved a GUI and so on). The professor introduced Weka, but only in its GUI form. I'm recreating the project but this time with the Weka library. Here is the class in question:
public class TreeModel {
private J48 model = new J48();
private String[] options = new String[1];
private DataSource source;
private Instances data;
private Evaluation eval;
// Constructor
public TreeModel(String file) throws Exception {
source = new DataSource(file);
// By default, the options are set to produce unpruned tree '-U'
options[0] = "-U";
data = source.getDataSet();
model.setOptions(options);
}
// Overloaded constructor allowing you to choose options for the model
public TreeModel(String file, String[] options) throws Exception {
DataSource source = new DataSource(file);
data = source.getDataSet();
model.setOptions(options);
}
// Builds the decision tree
public void buildDecisionTree() throws Exception {
data.setClassIndex(data.numAttributes() - 1);
model.buildClassifier(data);
}
/*
* Uses cross validation technique to calculate the accuracy.
* Gives a more respected accuracy that is more likely to hold
* with instances not in the dataset.
*/
public void crossValidatedEvaluation(int folds) throws Exception {
eval = new Evaluation(data);
eval.crossValidateModel(model, data, folds, new Random());
System.out.println("The model predicted "+eval.pctCorrect()+" percent of the data correctly.");
}
/*
* Evaluates the accuracy of a decision tree when using all available data
* This should be looked at with skepticism (less interpretable)
*/
public void evaluateModel() throws Exception {
eval = new Evaluation(data);
eval.evaluateModel(model, data);
System.out.println("The model predicted "+eval.pctCorrect()+" percent of the data correctly.");
}
/*
* Returns a prediction for a particular instance. Will take in an instance
* as a parameter.
*/
public String getPrediction() throws Exception {
DataSource predFile = new DataSource("./predict.arff");
Instances pred = predFile.getDataSet();
Instance predic = pred.get(0);
pred.setClassIndex(pred.numAttributes() - 1);
double classify = model.classifyInstance(predic);
pred.instance(0).setClassValue(classify);
return pred.instance(0).stringValue(6);
}
// Returns source code version of the model (warning: messy code)
public String getModelSourceCode() throws Exception {
return model.toSource("DecisionTree");
}
}
In my getPrediction() method I have a simple example of getting a prediction for an instance in an ARFF file. The problem is I cant figure out how to initialize a single Instance object and then put the data I want to make a prediction with "in" that instance. I have looked through the documentation for the Instance class but didn't see anything at first glance. Is there a way to manually put data into an instance or will I need to convert my prediction data into an ARFF file?
This code snippet should help you build your own set of instances without an ARFF file. Below I show creating a new set of instances from an array with two attributes; latitude and longitude.
import weka.core.Attribute;
import weka.core.DenseInstance;
import weka.core.FastVector;
import weka.core.Instances;
public class AttTest {
public static void main(String[] args) throws Exception
{
double[] one={0,1,2,3};
double[] two={3,2,1,0};
double[][] both=new double[2][4];
both[0]=one;
both[1]=two;
Instances to_use=AttTest.buildArff(both);
System.out.println(to_use.toString());
}
public static Instances buildArff(double[][] array) throws Exception
{
FastVector atts = new FastVector();
atts.addElement(new Attribute("lat")); //latitude
atts.addElement(new Attribute("lon")); //longitude
// 2. create Instances object
Instances test = new Instances("location", atts, 0);
// 3. fill with data
for(int s1=0; s1 < array[0].length; s1=s1+1)
{
double vals[] = new double[test.numAttributes()];
vals[0] = array[0][s1];
vals[1] = array[1][s1];
test.add(new DenseInstance(1.0, vals));
}
return(test);
}
}