Can anybody help me to implement an alternative missing value handling in J48 algorithm using Weka API in Java.
I am sure that using pre-imputation approaches before training the J48 is easy.
But what is about using a surrogate split attribute in case of partition the training date (like Breiman does in CART) instead of the J48 standard approach (Quinlan in C4.5) splitting the cases across a probability distribution from observed cases with known value.
Can anybody give me some information, tip, help, where in the Weka API and Source Code a have to modify to replace standard with surrogate split?
Look at weka source code weka.classifiers.trees.j48.C45ModelSelection from line 152 (Find "best" attribute to split on). It uses info gain ratio as splitting criteria.