Search code examples
decision-treecortana-intelligenceazure-machine-learning-serviceinformation-gain

Azure Machine Learning Decision Tree Entropy / Information Gain


Is there a way to see Entropy/Information Gain for each feature when training a Decision Tree in Azure ML?


Solution

  • Traditional Node Performance:

    You can currently only view relative gain in gini within the boosted decision tree models. Right-click and visualize the output of a trained boosted decision tree, image linked below. From that, wait for the trees to load. You may then click on the nodes of each individual tree to view the split gains at each level.

    Split Gain at Each Node

    Entropy/Information Gain:

    Though let's step back and ask why would we want to view entropy? Entropy is a node specific measurement within an individual tree. Azure Machine Learning does not have single tree classifiers such as rpart in R, only ensembles of trees in the form of decision forest, decision jungle, and boosted decision tree modules.

    Variable Importance:

    Therefore, I am guessing that you are looking for variable/feature importance measurements, which is the aggregation or average of the overall gini/entropy/information gain of all node splits in all trees within the ensemble. Azure ML has a module that calculates feature importance from a trained algorithm called Permutation Feature Importance module. It works by running random predictor values through your trained model to see the magnitude at which the response class changes.

    Permutation Feature Importance Module