machine-learning classification weka naivebayes

How does the weighting system procedure in machine learning work?

In Weka, we have a option to assign weights to some instances especially when the data set is imbalance in terms of the classes. But what I cannot understand is how does this weighting system work?

For example: When we use Naive Bayes or decision tree as classification algorithms on a data set with some instances which have weight of 5, does it mean that those instances are considered 5 times by the algorithm?

Solution

Samples weighting is classifier specific. There is no one, universal answer. Simply many classifiers (as well as regressors) have their own internal methods of using samples weights. For many of them, it is equivalent of thinking about it in terms of samples replication, however remember that weights can be arbitrary, positive real numbers, thus you can weight by pi even though you cannot replicate a sample pi times. In case of Naive Bayes - samples weights are used inside the probability estimators to proportionaly weight each sample, thus it is equivalent to replication (if you put integer weight). For decision trees it is way more complicated, and for arbitrary method - the answer is model/implementation specific.