Search code examples
machine-learningformulawekaentropyinformation-gain

What is Weka's InfoGainAttributeEval formula for evaluating Entropy with continuous values?


I'm using Weka's attribute selection function for Information Gain and I'm trying to figure out what the specific formula Weka uses when dealing with continuous data.

I understand the usual formula for Entropy is this for when the values in the data are discrete. I understand that when dealing with continuous data one can either use Differential Entropy or discretize the values. I've tried looking at Weka's explanation to InfoGainAttributeEval and have looking through so many other references, but can't find anything.

Maybe its just me, but would anyone know how Weka implements this case?

Thanks!


Solution

  • I asked the author Mark Hall and he said:

    It uses the supervised MDL-based discretization method of Fayad and Irani. See the javadocs:
    http://weka.sourceforge.net/doc.stable-3-8/weka/attributeSelection/InfoGainAttributeEval.html

    Also you can see this link for the discretization method:

    http://weka.sourceforge.net/doc.stable-3-8/weka/filters/supervised/attribute/Discretize.html