I am trying to create test and train Instances
objects. There are some attributes in train that are not in test. I'm having trouble with the proper filtering methodology, though. I have tried two filters. Below is the code with the errors they produce.
Instances rawTraining = new Instances(arffFile);
Instances rawTesting = new Instances(arffFile);
System.out.println("Raw Training Attributes: "+rawTraining.numAttributes());
//Raw Training Attributes: 2446
System.out.println("Raw Testing Attributes: "+rawTesting.numAttributes());
//Raw Testing Attributes: 2381
rawTraining.setClassIndex(rawTraining.numAttributes()-1);
NumericToNominal Filter
NumericToNominal filter = new NumericToNominal();
filter.setAttributeIndicesArray(new int[] {rawTraining.classAttribute().index()});
filter.setInputFormat(rawTraining);
Instances finalTraining = Filter.useFilter(rawTraining, filter);
Instances finalTesting = Filter.useFilter(rawTesting, filter);
Produces the error:
java.lang.IllegalArgumentException: Src and Dest differ in # of attributes: 2381 != 2446
at weka.core.RelationalLocator.copyRelationalValues(RelationalLocator.java:87)
at weka.filters.Filter.copyValues(Filter.java:371)
at weka.filters.Filter.bufferInput(Filter.java:313)
at weka.filters.SimpleBatchFilter.input(SimpleBatchFilter.java:199)
at weka.filters.Filter.useFilter(Filter.java:680)
Standardize Filter
Standardize filter = new Standardize();
filter.setInputFormat(rawTraining);
Instances finalTraining = Filter.useFilter(rawTraining, filter);
Instances finalTesting = Filter.useFilter(rawTesting, filter);
Produces the error:
java.lang.IndexOutOfBoundsException: Index: 2381, Size: 2381
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at weka.core.Instances.attribute(Instances.java:341)
at weka.core.AbstractInstance.attribute(AbstractInstance.java:72)
at weka.filters.unsupervised.attribute.Standardize.convertInstance(Standardize.java:240)
at weka.filters.unsupervised.attribute.Standardize.input(Standardize.java:142)
at weka.filters.Filter.useFilter(Filter.java:680)
How can I make these two Instances` compatible?
The answer provided here will help address some of your concerns: Does test file in WEKA require or less number of features as train.
In short, you first need to make sure you have the same attributes for your training and testing instances (you should be able to insert '?' into any class attributes). The code snippets you provided look fine, so I would handle this first and then see what happens.