Search code examples
pythonmachine-learningscikit-learnknn

How is parameter "weights" used in KNeighborsClassifier?


In sklearn documentation, the parameter weights="distance" of function KNeighborsClassifier is explained as follows:

distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

While it make sense to me to weight neighboring points and then calculate the prediction as mean of weighted points, for instance using KNeighborsRegressor... However, I cannot see how weight is used in a classification algorithms. According to the book The Elements of Statistical Learning, the KNN classification is based on majority vote. Isn't it?


Solution

  • During Classification, the weights would be used when computing the mode of neighbors (Instead of frequency, sum of the weights would be used to compute mode).

    To know more details from look here, for the actual implementation.

    Examples from documentation:

    >>> from sklearn.utils.extmath import weighted_mode
    >>> x = [4, 1, 4, 2, 4, 2]
    >>> weights = [1, 1, 1, 1, 1, 1]
    >>> weighted_mode(x, weights)
    (array([4.]), array([3.]))
    The value 4 appears three times: with uniform weights, the result is simply the mode of the distribution.
    
    >>>
    >>> weights = [1, 3, 0.5, 1.5, 1, 2]  # deweight the 4's
    >>> weighted_mode(x, weights)
    (array([2.]), array([3.5]))
    

    You can view the implementation here