I am trying to compare two 13-D vectors using the cosine similarity but want all of the column entries/features to have equal weighting. Right now, I have 3 features with much larger values that appear to be too heavily-weighted in my comparison results. Is there any easy way to normalize the different features so that they are on a similar scale. I am doing this in python.
The usual approach is for each feature x
to recalculate them as x = x - np.mean(x)
this will place your frame of reference at the center of the cluster, "look to the points closer".
Then for each cluster x = x / sqrt(mean(x**2))
, this will normalize the features, this will make the points more evenly distributed over all possible directions in the feature space.