Search code examples
wekaconfusion-matrix

What is the advantage of using weighted average F measure in weka


In weka I have seen the F-measure of the 'yes' class and 'no' class seperately. But what is the advantage of using the weighted average F-measure to compare the performance of the models. Please help me to find the answer :)


Solution

  • Let's start with a smart example, classifying protein interactions in text using machine learning, where our classifier has attempted to classify sentences into two classes: (1) positive class (2) negative class. Positive class contains sentences that describe protein interactions and negative class comprises sentences that do not describe protein interactions. As a researcher, my focus will be the F-score of my classifiers for positive class. Why? Because I am interested to see my classifier's performance on classifying sentences that contain protein interactions and I do not care about its ability to classify negative sentences. Therefore, I will consider only the F-score of the positive class.

    However, for another classical problem like spam classification, where our classifier classifies emails into two classes: (1) hams and (2) spams, the scenario is a bit different. As a researcher, I would like to know my classifier's ability to classify hams as well as spams. At that point, I can either check the F-scores of each class independently or in an aggregated fashion. The weighted average of F-scores of ham and spam class is a means to check the performance of our classifier for both (in this case both, for multi-class problems read all) classes. Because the weighted F-measure is just the sum of all F-measures, each weighted according to the number of instances with that particular class label and for two classes, it is calculated as follows:

    Weighted F-Measure=((F-Measure for n class X number of instances from n class)+(F-Measure for y class X number of instances from y class))/total instances in dataset.
    

    So, the bottom line is- if the classification is sensitive for all the classes, use the weighted average of F-scores of all classes.