I want to count the numbers of true values between two String
from my training data, however, the code I implemented only counts the number of instances that are true as opposed to the total sum that are true.
//Load dataset
public class DatasetLoading {
public static Instances loadData(String location) {
try {
return DataSource.read(location);
}
catch (Exception e) {
System.err.println("Failed to load data from: " + location);
e.printStackTrace();
return null;
}
}
public static void main(String[] args) {
String dataLocation = "C:/Users/Emil/Desktop/Machine Learning - Java/Week 1/Arsenal_TRAIN1.arff";
Instances train = loadData(dataLocation);
System.out.println(train);
}
}
public class ForClassifier {
public static void main(String[] args) throws Exception {
String train1 = "C:/Users/Emil/Downloads/Week 1/Arsenal_TRAIN.arff";
Instances train = DatasetLoading.loadData(train1);
//train data
train.setClassIndex(train.numAttributes()-1);
Classifier Model = (Classifier)new NaiveBayes();
Model.buildClassifier(train);
int z=0;
double x = 0;
String x2 = null;
for (int i = 0; i < train.numInstances(); i++)
{
//return data
String trueClassLabel = train.instance(i).toString(train.classIndex());
double predicted = Model.classifyInstance(train.get(i));
if(predicted == 0.0) {
x=predicted;
}else if (predicted == 1.0){
x=predicted;
}else if(predicted == 2.0) {
x=predicted;
}
if(x == 0.0) {
String x1 = "Loss";
x2 = x1;
} else if(x == 1.0) {
String x1 = "Draw";
x2=x1;
} else if(x == 2.0) {
String x1 = "Win";
x2=x1;
}
//System.out.println(x2 + "\t"+trueClassLabel + "\t" + x2.equals(trueClassLabel));
if(x2.equals(trueClassLabel)) {
z++;
System.out.println(z);
}}}
The output that I get:
1
2
3
4
5
6
7
8
9
10
11
12
13
The expected output:
13
I have also tried getting the maximum value however, this returns 1
and not 13
:
if(x2.equals(trueClassLabel)) {
z++;
Integer[] test2= {z};
for(int j = 0; j<test2.length;j++) {
if(test2[max] < test2[i]) {
max=i;
}
}System.out.println(test2[max]);//1
@data:
@RELATION Arsenal
@ATTRIBUTE Leno {0,1}
@ATTRIBUTE Tierney {0,1}
@ATTRIBUTE Saka {0,1}
@ATTRIBUTE class {Loss,Draw,Win}
@DATA
1, 0, 0, Loss
1, 0, 0, Loss
0, 1, 1, Draw
1, 0, 1, Draw
0, 0, 1, Win
0, 1, 1, Win
1, 1, 1, Win
0, 1, 1, Win
1, 1, 0, Win
1, 0, 1, Win
1, 1, 0, Loss
0, 1, 0, Draw
1, 1, 0, Draw
1, 1, 0, Draw
0, 0, 1, Win
1, 0, 1, Win
0, 1, 1, Win
1, 1, 0, Win
1, 1, 1, Win
1, 1, 0, Win
Instead of comparing strings, why don't you just compare the numeric prediction obtained from classifyInstance
with the actual numeric class label from the training data (train.instance(i).classValue()
)?
Since you didn't post your full code (the DatasetLoading
class is missing), here is a simple rewrite of your code. The class expects the filename of the dataset to use as the first parameter. This class uses two approaches for evaluating the model: manual comparison of the predictions and using Weka's Evaluation class (which gives you a whole lot more statistics).
import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.classifiers.bayes.NaiveBayes;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
public class ForClassifier {
public static void main(String[] args) throws Exception {
// load dataset
Instances train = DataSource.read(args[0]);
train.setClassIndex(train.numAttributes() - 1);
// build classifier
Classifier model = new NaiveBayes();
model.buildClassifier(train);
// 1. manual evaluation
System.out.println("manual evaluation");
int correct = 0;
int incorrect = 0;
for (int i = 0; i < train.numInstances(); i++) {
double actual = train.instance(i).classValue();
double predicted = model.classifyInstance(train.get(i));
if (actual == predicted)
correct++;
else
incorrect++;
}
System.out.println("- correct: " + correct);
System.out.println("- incorrect: " + incorrect);
// 2. using Weka's Evaluation class
System.out.println("Weka's Evaluation");
Evaluation eval = new Evaluation(train);
eval.evaluateModel(model, train);
System.out.println("- correct: " + eval.correct());
System.out.println("- incorrect: " + eval.incorrect());
}
}
BTW: You should never evaluate on the training data, as this will be overly optimistic (the model has already seen all this data!).