Search code examples
algorithmmachine-learningnormalizationknnnearest-neighbor

Need of normalization in KNN algorithm


Why normalization is necessary in KNN ? I know that this process normalizes the effect of all the features on the results but the 'K' nearest points to a particular point V before normalization will be EXACTLY SAME as the 'K' nearest points to that particular point V after normalization. So what difference does normalization make regarding Euclidean Distance. After all KNN depends completely on Euclidean Distances ?  thanks in advance !


Solution

  • Most normalization techniques will change the 'K' nearest neighbours, if you have different variability in different dimensions.

    Imagine dataset of A=(-5,0), B=(-5,1) and C=(5,1). Now consider a point of interest (4.5, 0). Clearly, C is the closest neighbour.

    After min-max normalization to (-1,1) in both dimensions, your dataset becomes A=(-1, -1), B=(-1,1), C=(1,1). Your point of interest corresponds to (0.9, -1) in this new space. Thus, A is now the closest neighbour.