Search code examples
classificationorange

Attribute order influences results Naive Bayes Orange


I have a data-file with 13 attributes and binary class variable. In Orange Canvas, when i apply the 'Naive Bayes classifier' and then check the performance with the 'Test Learner' i find that the results depend on the order in which the attributes are selected in de 'Select Attributes' widget. The difference is not large, for example the accuracy goes from 0.78 to 0.76.

As the Naive Bayes algorithm consists of multiplying estimated probabilities, the order of the terms should not matter. Closer examination revealed:

  • this only happens for the relative freaquencies estimation (not for Laplace)
  • it does not happen for every datafile, or for every rearrangement. It does happen when the first 3 variables are moved to the last 3 places
  • our datafile contains zero frequencies
  • it appears that the difference is not due to different probability estimates. When calling the estimators from the command line, the order in which the attributes are presented in the data file does not matter.

The call looks like this:

bayes_rl = Orange.classification.bayes.NaiveLearner(estimator_constructor=Orange.statistics.estimate.RelativeFrequency())
bayes_relative = bayes_rl(data)
print bayes_relative.conditional_distributions

Of course, i am assuming here that calling the classifier from the command line is equivalent to selecting the attributes visually in the same order as they appear in the file.

This makes me a bit insecure as to what is going on, is it some kind of rounding error?


Solution

  • The order of attributes does matter due to the limited precision of machine floating point number representation, in particular when multiplying small (near zero) numbers. This is probably the cause this behavior (the Naive Bayes in Orange uses 32 bit floating point precision).