I am difficulty understanding how both classifiers work under the hood. So far I have deduced NaiveBayes predicts an outcome by 'uncoupling' multiple pieces of evidence, and to treating each of piece of evidence as independent. But when compared to another classification algorithm like J48 or RandomTree, how exactly is each different from another?
For instance, the table below shows the percentage of correctly classified instances within two data sets. I can conclude that these two classifiers selected are both best suited for the Labor dataset as they both can correctly classify more instances compared to the diabetes dataset.
https://i.sstatic.net/TtB3Q.png
However, as seen below, NaiveBayes performs terribly on the Glass data set. What is the reason behind this? Is it down to the likelihood of anomalies in the data set (i.e. which we can probably determine from the standard deviation or mean)?
https://i.sstatic.net/CHfVb.png
Is anybody able to provide a layman's description of both classifiers, with regards to the results above?
(Sorry, due to my low reputation, I can't post images).
In the glass dataset, all values (except for "RI") are percentages, which for each row sum up to ~100%. So they are by definition NOT independent.
For instance, if a glass contains 50% silicon (Si) and 30% aluminum, these two components alone comprise 80% of the theoretical 100%. So for all the other elements (Mg, Fe, Na, K etc) there is only 20% of the remaining 100% left. So the Si value will tend to be automatically negatively correlated to any minor element, and the minor elements will tend to be correlated with each other.
In environmental statistics, this is known as the "closed data" problem. Read the introduction of this paper for more info: Univariate statistical analysis of environmental (compositional) data: Problems and possibilities (I just googled this)
One way around this is to measure trace-elements which occur at concentrations << 1%. These can indeed be treated as independent.