Search code examples
datasetsentiment-analysisweb-mining

Good dataset for sentiment analysis?


I am working on sentiment analysis and I am using dataset given in this link: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html and I have divided my dataset into 50:50 ratio. 50% are used as test samples and 50% are used as train samples and the features extracted from train samples and perform classification using Weka classifier, but my predication accuracy is about 70-75%.

Can anybody suggest some other datasets which can help me to increase the result - I have used unigram, bigram and POStags as my features.


Solution

  • There are many sources to get sentiment analysis dataset:

    Anyway, it does not mean it will help you to get a better accuracy for your current dataset because the corpus might be very different from your dataset. Apart from reducing the testing percentage vs training, you could: test other classifiers or fine tune all hyperparameters using semi-automated wrapper like CVParameterSelection or GridSearch, or even auto-weka if it fits.

    It is quite rare to use 50/50, 80/20 is quite a commonly occurring ratio. A better practice is to use: 60% for training, 20% for cross validation, 20% for testing.