Search code examples
wekalibsvmdocument-classification

How to use same StringToWordVector filter for training data and unseen data


I have used LibSVM wrapper for weka and successfully built a classifier for news classification (Sports and Business). I have evaluated it using cross validation method and accuracy is accepted. So now I need to classify a new news article using the model. Before giving it to classifier I need to transfer it to a feature vector using StringToWordVector filter in weka. How ever I need to use the same filter that I have used for training data. How can I achieve that?


Solution

  • We can use batch filtering option as given below,

     StringToWordVector filter = new StringToWordVector(); //initialise the filter
     //set filter options
     filter.setInputFormat(trainingData); //set input format to filter using training data
     Instances trainingDataFiltered = Filter.useFilter(trainingData, filter); // filter training data
     Instances testDataFiltered = Filter.useFilter(trainingData, filter); // filter test data