I'm curious why xgBoost doesn't support the min_samples_leaf
parameter like the classic GB classifier in sklearn? And if I do want to control the min. number of samples on a single leaf, is there any workaround in xgboost?
You could try using min_child_weight
. According to the documentation, this parameter:
minimum sum of instance weight (hessian) needed in a child.
For regression problems with MSE loss, the sum of instance weight will result in the minimum samples per leaf node because the second derivative of MSE loss equals one.
For classification problems, it will result in a different metric that characterizes the purity of the samples in a leaf node (e.g., for binary classification, if a proportion of samples of one class heavily dominate the other classes in a leaf — there is no need to split it further).
I don't know about a specific reason for not having min_samples_leaf
parameter. I guess its interference with min_child_weight
will bring some design challenges and confusion to the users.