algorithm computer-vision classification similarity

Similarity measure in classification algorithm

I have developed an algorithm that classifies sperm motility in four classes (1,2,3 and 4) using some velocity values, VSL (straight line velocity), VCL (curviline velocity) and LIN (linearity). I do this on sperm trajectories from videos.

The information i have is: Video_n results (using an Heuristic Algorithm):

Class 1: 10% (10% of sperms are in this motility class)
Class 2: 20%
Class 3: 30%
Class 4: 40%

Video_n results ( according to an expert)

Class 1: 10%
Class 2: 30%
Class 3: 20%
Class 4: 40%

I'm having troubles finding a method to measure the similarity between this information (efficacy of the algorithm)

For example if i do

Class 1(heuristic)/Class 1(expert) = 1 *100% it means that similarity according to Class 1 is 100% meaning that heuristic algorithm is "perfect" clasifying Class 1 sperms.

Class 2(heuristic)/Class 2(expert) = 0.66 *100% it means that similarity according to Class 1 is 66% meaning that heuristic algorithm is "good" clasifying Class 2 sperms.

but in Class 3 i would get a 150% wich confused me. Someone has and idea of what other measure can i use to get similarity or what does that 150% means in terms of efficacy?

Solution

There a number of possible measures of similarity. Ideally, you should derive one yourself that takes account of the reason why you are doing this classification, so that good similarity scores amount to something that performs well when you use it in practice. Here are a few examples.

1) Cosine similarity. Treat the two sets of percentages as vectors, make them into unit vectors, and take the dot product to give you something between 0 and 1. So in your example you would have (10 * 10 + 20 * 30 + 30 * 20 + 40 * 40) / (sqrt(10 * 10 + 20 * 20 + 30 * 30 + 40 * 40) * sqrt(10 * 10 + 30 * 30 + 20 * 20 + 40 * 40)).

2) If the expert and the classification system classified the same sperm and you kept track of which was which you could work out what percentage the classification system got correct. You didn't do this, but you can work out the maximum possible consistent with the data you have by taking, for each class, the minimum either assigned to this class. In your example, the classification system could have been correct for at most min(10, 10) + min(20, 30) + min(30, 20) + min(40, 40) percent. This score will be somewhere between 0 and 100 percent, with 100 percent for a perfect match.

3) If the result of your classification was used as an input to a diagnostic test (e.g. patient will be infertile if...) instead of comparing the classification output, look at how often the results of your classification produce the same test result as the results of expert classifications - see http://en.wikipedia.org/wiki/Receiver_operating_characteristic)