I have a data set with 50% instances from class A and 50% instances of class B. I want to split my data set into a training set and a test set. I know the RemovePercentage filter exists but it doesn't care about the class balance. How do I remove 35% from my data set but still keep a 50/50 class distribution in the training set?
Ok, I've found a way using the filter StratifiedRemoveFolds:
Step 1
Open your data set in the Weka Explorer and choose the supervised instance filter StratifiedRemoveFolds.
Step 2
Decide the sizes you want for your training and test set. If you want your sets to have an equal size then pick for numFolds 2. Apply the filter. This will generate a data set that contains 50 % of the data from the original set. (If you want 67 % train data and 33 % test data then pick 3 for numFolds)
Step 3
Save this generated set as f.e. "train.arff". When the first set is saved you must Undo the action so that you are back with your full data set.
Step 4
Click on the StratifiedRemoveFolds filter and change the parameter invertSelection from False to True. Now when you apply that filter a set will be generated like in step 2 but it will contain the other 50 % of the data set.
Step 5
Save this as "test.arff**. Now you have a train and test set that respect your class balance.