Search code examples
wekaj48

What does the useLaplace parameter do in the WEKA j48 algorithm?


I am mining on a dataset using the j48 tree algorithm.

I have been trying to understand what the useLaplace parameter does. The only thing I have to go by is this:

Whether counts at leaves are smoothed based on LapLace

which is just the documentation which WEKA has provided. I have some questions about this though:

  1. What are counts at leaves?
  2. What is smoothing?
  3. What is LapLace? Is it an algorithm used for smoothing?

Everything I have found online doesn't really go into detail about what this parameter is actually doing, rather just explains that it "turns on Laplace smoothing."


Solution

  • Provost and Domingos found that frequency smoothing of the leaf probability estimates, such as Laplace correction, significantly enhances the performance of the decision tree. From what i have read, counts at leaves (a.k.a leaf probability in my previous sentence) are used to determine probabilistic estimate which can be define by:

    P( to be class A | for attribute x) = TruePositive/(TruePositive + FalsePositive)

    Smoothing consist in reducing noise and error among the results in the tree in order to produce more accurate probabilistic estimate.

    Laplace is a frequency smoothing correction formula:

    PLaplace ( to be class A | for attribute x)= (T P + 1)/(T P + F P + C)

    where C is the number of clas in the dataset.