Search code examples
machine-learningclassificationweka

Altrnative of "Weka: training and test set are not compatible"?


"Weka: training and test set are not compatible" can be solved using batch filtering but at the time of training a model I don't have test.arff. My problem caused in the command "stringToWord vector" (on CLI).

So my question is, can Caret package(R) or Scikit learn (Python) provides any alternative for this one. Note: 1. Functionality provided by "stringToWord vector" is a must requirement. 2. I don't want to retrain my model while testing because it takes lot of time.


Solution

  • Given the requirements you mentioned, you can use Weka's Filtered Classifier option during training and testing. I am not re-iterating what I have recorded as a video cast here and here.

    But the basic idea is not to use the StringToWord vector as a direct filter rather to use it as a filtering option in the FilteredClassifier option. The model you generate will be just once. And then you can apply the model directly on your unlabelled data without retraining them or without applying StringToWord vector again on the unlabelled data. FilteredClassifier will take care of these concerns for you.