Search code examples
svmwekasentiment-analysis

How to use 2 dataset, 1 for training and 1 for testing on WEKA for sentiment analysis


So I have 3 dataset that I used for sentiment analysis and I want to use only 1 dataset for building the model and the rest of the dataset for testing purpose. The model that I will use is SVM(SMO algoritm). The datasets at start only have 2 attributes (text,label) but after preprocessing with string to wordvector it become many attributes. I was able to build a model and test it using 10-fold cross validation and now I want to test it with the other dataset. But since it has different attributes due to string to word vector I can't do it. Any solution for my problem?

I already applied the same preprocess to the test set and tried using "inputmappedclassifier" but the result is still error

I was hoping the model can be used on datasets that it never see


Solution

  • See http://jmgomezhidalgo.blogspot.com/2013/05/mapping-vocabulary-from-train-to-test.html

    If you know both train and test data you can use batch filtering.

    If you don't know test data then you can use FilteredClassfier method. Check http://jmgomezhidalgo.blogspot.com/2013/01/text-mining-in-weka-chaining-filters.html and http://jmgomezhidalgo.blogspot.com/2013/04/a-simple-text-classifier-in-java-with.html

    Also have a look at How to use StringToWordVector (weka) in java?