I am working on a text mining project that I use a labled tweets data set (yes or no earthquake related tweets includes 8000 samples) to train Naive, MLP (Deep Learning) LibSVM classifiers to classify (yes or no) Unlabeled Tweets (28000 samples) on RapidMiner. Here is the result of three Machine Learning algorithm:
Naive
Accuracy= 80%
Number of tweets labeled "yes"= 6056
MLP
Accuracy= 86%
Number of tweets labeled "yes"= 2300
LibSVM
Accuracy= 92%
Number of tweets labeled "yes"= 53
My question is why the number of labelled tweets are drastically different?
I assume the accuracy you're giving is coming out of the model building process on your labeled data set. This represents how well the trained models can "reproduce" the correct labels of your training data. The big difference of numbers of the assigned labels on your unknown, unlabeled tweets seems to indicate strong overfitting problems in your models. This means the models are very well trained to reproduce the training data, but fail to generalize on new, unknown data.
As a first suggestion, check your validation steps. There are basic techniques like Cross-Validation that try to avoid overfitting, but there are many possibilities to "trick" yourself by introducing knowledge about your test set into your training data.
However, without the specific process setup we can only speculate.