Search code examples
machine-learningclassificationweka

Classify data-set (stringToWord) filter by weka


i'm new in weka.

i've a data-set (twitter data) about specific company .. the filter i used : string to word .. and i change the option wordstokeep =100 , to improve the accuracy . then i applied classifiers : Kstar 55% , RandomForest 57% , SMO 58% these not that most good result ..

enter image description here

is there any idea , that help me to improve it very well >>


Solution

  • First try preprocess your data. Twitter data contains lot of noise. Remove:

    1. URL
    2. Retweets
    3. Hashtags
    4. Special Characters One more thing that you can do is use of n-grams. Try different n-grams and check which one suits you the most. My take is go with unigrams +bigrams.

    I also suggest use naiveBayesMultinomial classifier. It happens to work best with text classification ans specially in sentiment Analysis.Plus it is super fast too. If you want code to preprocess the data, let me know :)