I am having a problem in choosing a adequate distance function to measure the similarity (dissimilarity) between two relative frequency vectors.
More specifically, I am using shape feature vectors that contain data about the basic shapes (circle, triangle, square) present in an image. Thus the vectors are in the form
[% of circles, % of triangles, % of squares]
For example, if an image contains 4 circles, 2 triangles and 4 squares, then its shape feature vector should be:
[0.4, 0.2, 0.4]
The initial idea was to simple measure the euclidean between the corresponding elements of the two vector features and then adds the results together. However I am not convinced that this is the best approach. Can someone suggest a good approach to measure the distance between such two vectors, or suggest any algorithm for such situation? Are more sophisticated probabilistic distance functions required to obtain good results such as the Chi-Squared or the Kullback Leibler Divergence distance functions?
Thanks Peter
What distance function to use depends on your concrete task.
I guess cosine similarity may be what you want.