Search code examples
javamachine-learningclassificationwekaj48

how can i find training error or error(D) and test error or error(s)


I need to find training error or error(D) and test error or error(s).

hypothetically, to find error(s) we use formula : misclassified instances/total instances then to find error(D) we use error(s)+-confidenceInterval (sqrt(error(s(1-error(s)/n))))
here n= total instances

now how can i find misclassified instances? is it same as Incorrectly Classified Instances which can be found using evaluate model of Evaluation class from weka? let me know please

code:

import weka.classifiers.evaluation.Evaluation;
import weka.classifiers.trees.J48;
import weka.classifiers.trees.j48.ClassifierTree;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
@SuppressWarnings("unused")
public class J48Tree {
public static void main(String[] args) throws Exception {       
    //load dataset
     DataSource trainsource = new DataSource(".//training data.arff");
     DataSource testsource = new DataSource(".//test data.arff");
     Instances dataset=trainsource.getDataSet();
     Instances datatestset=testsource.getDataSet();     
     //set class index to the last attribute
     dataset.setClassIndex(dataset.numAttributes()-1);
     datatestset.setClassIndex(dataset.numAttributes()-1);          
     //create classifier
     J48 tree = new J48();
     //using an unpruned J48 
     tree.setUnpruned(true);
     //build the classifier
     tree.buildClassifier(dataset);     
     // evaluate classifier and print some statistics
     Evaluation eval = new Evaluation(dataset);
     eval.evaluateModel(tree, datatestset);
     System.out.println(eval.toSummaryString("\nResults\n======\n", true));         
 }    }

output:

Results

Correctly Classified Instances         540               22.2772 %
Incorrectly Classified Instances      1884               77.7228 %
Kappa statistic                          0.0644
K&B Relative Info Score              78375.7967 %
K&B Information Score                 1912.8906 bits      0.7891     bits/instance
Class complexity | order 0            7268.6047 bits      2.9986 bits/instance
Class complexity | scheme           725668.4216 bits    299.3682 bits/instance 
Complexity improvement     (Sf)    -718399.8169 bits   -296.3696 bits/instance
Mean absolute error                      0.2186
Root mean squared error                  0.3897
Relative absolute error                 91.6895 %
Root relative squared error            109.0212 %
Total Number of Instances             2424     

Solution

  • If you have doubt that "Incorrectly classified" and "misclassified" are the same, then use the source.

    Looking at the Weka ssource code (and fortunately it is open source) is the only approach to learn what it is exactly doing. Even if I would tell you "yes it is", this may be correct for one version and wrong in another. So, use the source of your version as authorative resource.