Search code examples
weka

Weka: Train and test set are not compatible


I'm trying to classify some web posts using weka and naive bayes classifier.

First I manually classified many posts (about 100 negative and 100 positive) and I created an .arff file with this form:

@relation classtest
@attribute 'post' string
@attribute 'class' {positive,negative}
@data
'RT @burnreporter: Google has now indexed over 30 trillion URLs. Wow. #LeWeb',positive
'A special one for me  Soundcloud at #LeWeb ',positive
'RT @dianaurban: Lost Internet for 1/2 hour at a conference called #LeWeb. Ironic, yes?',negative
   .
   .
   .

Then I open Weka Explorer loading that file and applying the StringToWordVector filter to split the posts in single word attributes.

Then, after doing the same with my dataset, selecting (in classify tab of weka) naive bayes classifier and choosing select test set, it returns Train and test set are not compatible. What can I do? Thanks!


Solution

  • Probably the ordering of the attributes is different in train and test sets.

    You can use batch filtering as described in http://weka.wikispaces.com/Batch+filtering