Search code examples
wekasamplesampling

stratified sampling not working correctly in weka


I have a dataset which i applied StringtoWordVector and remove filter and then removed 1 fold using StratifiedFolds

This is the samples i have. My random seed is 0.

enter image description here

However, When i chained the stringtowordvector with a attributeEval filter then removed one fold, i got this sample.

enter image description here

How do i ensure that both folds have the same instances? I am fine with following either samples.

I am trying to compare the effectiveness of feature selection and i cannot work with it having different test sets.


Solution

  • I have found the workaround. Firstly i Split the dataset into folds and saved them as train/test arff. Then i performed the remove filter on the dataset which results in a stratified sample as above