There is an attribute filter which should remove each attribute which is matching a specified regular Expression from a set of Instances.
I have problems with the RegEx.
I tried several simple which all are valid (tested on regexr). But the Filter seems to not accept them.
Following the relevant code.
Instances dataset1_x=new Instances(dataset1);
RemoveByName filterX=new RemoveByName();
filterX.setInputFormat(dataset1_x);
filterX.setInvertSelection(true);
filterX.setExpression(Pattern.quote("^.*i$"));
//filterX.setExpression("^.*i$"); also don't work
Instances dataset1_=Filter.useFilter(dataset1_x,filterX);
This should match all names ending with an "i".
The resulting dataset is named
"dataset-weka.filters.unsupervised.attribute.StringToNominal-Rlast-weka.filters.unsupervised.attribute.Remove-weka.filters.unsupervised.attribute.RemoveByName-E^.*id$"
Note that ^.*id$
is the default expression. It has not changed.
Although filterX.getExpression();
gives the correct regex set before.
Also this usage of the filter corresponds to several code-examples.
Same if I set the regex using Filter.setOptions();
This is an issue of version 3.9.0 dev and also 3.8 stable.
Using the WEKA-GUI, the filter is working correctly.
Thus another assumption is that if entered programmatically, the regex must have a special format.. Unfortunately the API does not provide examples..
You need to set the expression and the InvertSelection-flag before setting the input format.
More generally i assume that you have to set all option before setting the inputFormat.
Following is working.
Instances dataset1_x=new Instances(dataset1);
RemoveByName filterX=new RemoveByName();
filterX.setInvertSelection(true);
filterX.setExpression(Pattern.quote("^.*i$"));
filterX.setInputFormat(dataset1_x);
Instances dataset1_=Filter.useFilter(dataset1_x,filterX);