Search code examples
pythonscikit-learngradientboosting

Python SkLearn Gradient Boost Classifier Sample_Weight Clarification


Using Python SkLearn Gradient Boost Classifier. The setting I am using is selecting random samples (stochastic). Using the sample_weight of 1 for one of the binary classes (outcome = 0) and 20 for the other class (outcome = 1). My question is how are these weights applied in 'laymans terms'.

Is it that at each iteration, the model will select x rows from the sample for the 0 outcome, and y rows for the 1 outcome, then the sample_weight setting will kick into and keep all of x but oversample the y (1) outcome by a factor of 20?

In the documentation I am not clear if it is oversampling by having sample_weight > 1. I understand that class_weight is different and does not change the data but how the model interprets the data via the loss function. Sample_weight on the other hand, is it true that it effectively changes the data fed into the model by oversampling?

Thanks


Solution

  • Sample weights are a multiplier factor, here is the code:

    https://github.com/scikit-learn/scikit-learn/blob/f0ab589f/sklearn/ensemble/gradient_boosting.py#L1225