Can anyone help me to calculate F-measure collectively ? I know how to calculate recall and precision, but don't know for a given algorithm how to calculate one F-measure value.
As an exemple, suppose my algorithm creates m clusters, but I know there are n clusters for the same data (as created by another benchmark algorithm).
I found one pdf but it is not useful since the collective value I got is greater than 1. Reference of pdf is F Measure explained. Specifically I have read some research paper, in which the author compares two algorithms on the basis of F-measure, they got collectively values between 0 and 1.
if you read the pdf mentioned above carefully, the formula is F(C,K) = ∑ | ci | / N * max {F(ci,kj)}
where ci is reference cluster & kj is cluster created by other algorithm, here i is running from 1 to n & j is running from 1 to m.Let say |c1|=218 here as per pdf N=m*n let say m=12 and n=10, and we got max F(c1,kj) for j=2. Definitely F(c1,k2) is between 0 and 1. but the resultant value calculated by above formula we will get value above 1.
The paper Characterization and evaluation of similarity measures for pairs of clusterings by Darius Pfitzner, Richard Leibbrandt and David Powers contains a lot of useful information regarding this subject, including the following example:
Given the set,
D = {1, 2, 3, 4, 5, 6}
and the partitions,
P = {1, 2, 3}, {4, 5}, {6}, and Q = {1, 2, 4}, {3, 5, 6}
where P is set created by our algorithm and Q is set created by standard algorithm we known
PairsP = {(1, 2), (1, 3), (2, 3), (4, 5)}, PairsQ = {(1, 2), (1, 4), (2, 4), (3, 5), (3, 6), (5, 6)}, and PairsD = {(1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 3), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6), (4, 5), (4, 6), (5, 6)}
so,
a = | PairsP intersection PairsQ | = |(1, 2)| = 1 b = | PairsP- PairsQ | = |(1, 3)(2, 3)(4, 5)| = 3 c = | PairsQ- PairsP | = |(1, 4)(2, 4)(3, 5)(3, 6)(5, 6)| = 5
F-measure= 2a/(2a+b+c)
Note: There is an error in the publication on page 364 where a, b, c, and d are computed and the result of b and c are actually switched incorrectly. This switch would throw off the results of some other measures. Obviously, the F-measure is unaffected.