Search code examples
javawekadecision-treemissing-datasurrogate-pairs

Weka: How can I implement a Surrogate Split in J48 Decision Tree?


Can anybody help me to implement an alternative missing value handling in J48 algorithm using Weka API in Java.

I am sure that using pre-imputation approaches before training the J48 is easy.

But what is about using a surrogate split attribute in case of partition the training date (like Breiman does in CART) instead of the J48 standard approach (Quinlan in C4.5) splitting the cases across a probability distribution from observed cases with known value.

Can anybody give me some information, tip, help, where in the Weka API and Source Code a have to modify to replace standard with surrogate split?


Solution

  • Look at weka source code weka.classifiers.trees.j48.C45ModelSelection from line 152 (Find "best" attribute to split on). It uses info gain ratio as splitting criteria.