Search code examples
algorithmsimilaritycorrelated

Is there a standard approach to find related/similar objects?


Suppose I have a set of entities (for example people with their physical characteristics) and I want to find, for a given entity X, all entities related (or similar) to it, for some definition of similarity.

I can easily find such entities for one dimension (all people with height Y ~= X's height within a certain threshold) but is there some approach that I can use to find similar entities considering more than one attribute?


Solution

  • It is going to depend on what you define as similarity, but you can use the same approach you take for 1D, to any dimension, with a small generalization. Assuming each element is represented as a vector, you can measure the distance of 2 vectors x,y as d=|x-y|, and accept/reject depending on this d and some threshold.

    In here, the minus operator is vector negation:
    (a1,a2,...,an)-(b1,b2,...,bn)=(a1-b1,a2-b2,...,an-bn)
    and the absolute value is again for vectors:
    |(a1,a2,...,an)| = sqrt(a1^2 + a2^2 + ... + an^2).

    It is easy to see that this is generalization of your 1D example, and invoking the same approach for vectors with a single element will do the same.


    Downside of this approach is (0,0,0,...,0,10^20) and (0,0,0,....,0) will be very far away from each other - which might or might not be what you are after, and then you might need a different distance metric - but that really depends on what exactly are you after.