algorithm statistics random-forest decision-tree entropy

In decision trees, what log base should I use if I have a node with multiple branches?

The following question confuses me a lot. could you help me with it?(preferably by finding some academic reference.)

We normally use base-2 log function to calculate entropy in decision trees, is this because most of the nodes only allow binary branches?

If I want to have a node with many branches, is log2 still theoretically valid?

For example, in Xgboost, the training set input should be of the form of a matrix, I think that means we can only put numerical values as input.

Thank you very much!

Solution

Base 2 for the logarithm is almost certainly because we like to measure the entropy in bits. This is just a convention, some people use base e instead (nats instead of bits).

I cannot talk about Xgboost, but for discrete decision problems entropy comes into play as a performance measure, not directly as a result of the tree structure. You can calculate the information gain of any split (using any branching factor) from just the definition of entropy.

If you're looking for a book on information theory and probability, I can highly recommend MacKay (full PDF available). He covers quite a bit of machine learning and statistics. Decision trees are however not covered.