I work on a machine learning application and use Weka for testing, comparison classification algorithms etc. After the test operations on Weka, I determined to use J48 decision tree. I parsed the pruned tree which Weka had produced and implemented it as if-then format in C. However, if I tested my data which had been used as input for Weka in my program, results are not same as confusion matrix of Weka. In test options of Weka, I selected "Use training set" and I used that decision tree. Here is the confusion matrix and my results:
=== Confusion Matrix ===
a b c d e f g <-- classified as
178 1 0 1 13 2 7 | a = InstantMessaging
4 29 11 1 14 46 25 | b = Mail
1 3 1051 4 32 921 54 | c = Music
4 0 14 9596 10 4 10 | d = P2P
10 1 46 6 607 263 59 | e = SocialMedia
4 1 230 2 44 7619 63 | f = VideoStream
5 0 57 1 57 167 1016 | g = WebBrowsing
My results from program:
"instantMessaging" => 210,
"mail" => 33,
"music" => 4933,
"p2p" => 9886,
"socialMedia" => 1220,
"videoStream" => 4958,
"webBrowsing" => 1054,
"total" => 22294,
Although everything is same (decision tree, data, feature values, functions etc.), why do I get these different result? Is there a such possibility that Weka producing/showing wrong decision tree?
With more deeply search, I have found the answer. The problem was originated by changed function which create a feature. Since this function was changed, result of feature in the feature set was not equals to arff file. All results are logical now.