Search code examples
wekatext-classificationrweka

How to link 10-fcv weka predicted result back to original comment for text classification


Is there anyway I can route back my predicted result to original comment after text classification using 10-fold cross validation?

From the result of 2000 comments of class non-sarc and sarc:

inst#,actual,predicted,error,prediction
1,2:non-sarc,2:non-sarc,,1
2,2:non-sarc,1:sarc,+,1
3,2:non-sarc,2:non-sarc,,1
4,2:non-sarc,2:non-sarc,,1
5,2:non-sarc,2:non-sarc,,1
.
.
101,1:sarc,1:sarc,,1
102,1:sarc,2:non-sarc,+,1
103,1:sarc,1:sarc,,1
104,1:sarc,1:sarc,,1
105,1:sarc,1:sarc,,1
.
.

It looks like weka has re-arranged my comment to class split before hold out for training and testing. How can i refer back this result to original comments which are not in sequence order (not like after 10-fcv)? I've try re-arranged the comment to class sequence of non-sarc and sarc but I'm confuse which one test/training first, is it first fold test first, or last fold test first, or any other?

Thanks in advance.


Solution

  • Since no one answered my question and I've figured myself, hope this will help others if facing the same issue.

    1. In Preprocess; Filter> unsupervised; AddID to the attributes, to the first position. This will give ID for each of original label [IDIndex: First]

    1.Add IDIndex

    1. In Classify; Choose classifier. For test option, set 10-fcv, and in more option, set attributes to 1. And choose for link and output format prediction result [attributes: 1]

    2.Attribute and Output

    1. Start/Run prediction. Output shows actual label and prediction. Error is mark with + and ID refers to original label before prediction.

    3.Output

    All the best!