Search code examples
pythonscipycosine-similarity

Cosine Similarity normalization


I am trying to compare two 13-D vectors using the cosine similarity but want all of the column entries/features to have equal weighting. Right now, I have 3 features with much larger values that appear to be too heavily-weighted in my comparison results. Is there any easy way to normalize the different features so that they are on a similar scale. I am doing this in python.


Solution

  • The usual approach is for each feature x to recalculate them as x = x - np.mean(x) this will place your frame of reference at the center of the cluster, "look to the points closer".

    Then for each cluster x = x / sqrt(mean(x**2)), this will normalize the features, this will make the points more evenly distributed over all possible directions in the feature space.