Classifying delivers strange results

I've got a classifying problem. I have a data set of physiological data (pulse, skin resistance etc., 4 features) from an experiment with 19 persons. In the experiment they had to do a sequence of things that influenced them. That's why the data is divided in 10 classes for each stage of the experiment. Now I have two data sets: one containing all the data put together (starting with the first person and ending with the last), and one divided in one training set containing 17 persons and test set containing 2 persons. Now I use Wekas Random Forest to classify the data, and surprisingly, in 10-fold cross-validation using the first data set, I get almost perfect results (which seems very strange to me, I mean a problem with 10 classes and only 4 features?), but when I use the separate training and testing sets, I get very bad results. I also tried dividing the data using other 2 persons for test set, same bad results. The question is: what am I missing?

Solution

That's a high variance problem, meaning that your classifier is able to perfectly fit training data but is not able to generalize well. Read about bias\variance tradeoff and think about the way to improve the generalization (probably switch to another classifier which generalizes better or by decreasing the amount of your trees in random forest, etc).

Also possible that you have too little data available for training. Because of that your classifier is able to perfectly fit to the training data (since there is a small number of examples which can be easily distinguished from each other) but absolutely not able to generalize just because you haven't presented enough data to perform any reasonable sampling of the problem space.

Having just 19 persons supports second hypothesis - 19 records is not even close to be enough for ML algorithms.