Search code examples
algorithmselectiontheory

What is the theory to select one item based on various criteria?


I need to solve a problem where an item A must be compared to thousands of other items, and find out which items are the most similar to item A.

I want to assign a weight to each of these items depending on how similar they are to item A. Various criteria will determine the final weight. For instance, if item1.someProperty == otherItem.someProperty, then I increase the weight by 5, if item1.anotherProperty == otherItem.anotherProperty, then I only increase the weight by 1, because someProperty is more important than anotherProperty.

The reason I'm describing all that, is that I want to know if there's any theory that will help me create this system. In particular, how to choose the weight of each criteria, how to compute the final weight of an item, and how to architecture all that.

So does anybody know if there is any theory that could help? Or perhaps there is a better way to do what I'm trying to do?


Solution

  • You could think of your properties as dimensions and compose a distance out of them. If there is correlation between the properties, you could take that into account as well (google for Mahalanobis distance).

    But basically it winds down to

     float distance(a, b) {
        return w1 * ABS(a.x - b.x)
             + w2 * ABS(a.y - b.y)
               ...
        ;
     } 
    

    Instead of summing the terms, you could sum the squared terms (to penalise big differences), anything goes.

    BTW for nominal data you could use some entropy-based measure of difference.