I need to find training error or error(D) and test error or error(s).
hypothetically, to find error(s) we use formula : misclassified instances/total instances
then to find error(D) we use
error(s)+-confidenceInterval (sqrt(error(s(1-error(s)/n))))
here n= total instances
now how can i find misclassified instances? is it same as Incorrectly Classified Instances which can be found using evaluate model of Evaluation class from weka? let me know please
code:
import weka.classifiers.evaluation.Evaluation;
import weka.classifiers.trees.J48;
import weka.classifiers.trees.j48.ClassifierTree;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
@SuppressWarnings("unused")
public class J48Tree {
public static void main(String[] args) throws Exception {
//load dataset
DataSource trainsource = new DataSource(".//training data.arff");
DataSource testsource = new DataSource(".//test data.arff");
Instances dataset=trainsource.getDataSet();
Instances datatestset=testsource.getDataSet();
//set class index to the last attribute
dataset.setClassIndex(dataset.numAttributes()-1);
datatestset.setClassIndex(dataset.numAttributes()-1);
//create classifier
J48 tree = new J48();
//using an unpruned J48
tree.setUnpruned(true);
//build the classifier
tree.buildClassifier(dataset);
// evaluate classifier and print some statistics
Evaluation eval = new Evaluation(dataset);
eval.evaluateModel(tree, datatestset);
System.out.println(eval.toSummaryString("\nResults\n======\n", true));
} }
output:
Results
Correctly Classified Instances 540 22.2772 %
Incorrectly Classified Instances 1884 77.7228 %
Kappa statistic 0.0644
K&B Relative Info Score 78375.7967 %
K&B Information Score 1912.8906 bits 0.7891 bits/instance
Class complexity | order 0 7268.6047 bits 2.9986 bits/instance
Class complexity | scheme 725668.4216 bits 299.3682 bits/instance
Complexity improvement (Sf) -718399.8169 bits -296.3696 bits/instance
Mean absolute error 0.2186
Root mean squared error 0.3897
Relative absolute error 91.6895 %
Root relative squared error 109.0212 %
Total Number of Instances 2424
If you have doubt that "Incorrectly classified" and "misclassified" are the same, then use the source.
Looking at the Weka ssource code (and fortunately it is open source) is the only approach to learn what it is exactly doing. Even if I would tell you "yes it is", this may be correct for one version and wrong in another. So, use the source of your version as authorative resource.