Search code examples
machine-learningclassificationwekadecision-tree

Why doesn't ID3 algorithm work on the UCI Mushroom dataset in Weka?


I can’t seem to apply the ID3 classification algorithm to Mushroom.arff dataset. This dataset consists of nominal attributes only. I think I need to preprocess this in order for it to work, but I don’t know how. How do I proceed?

this image shows i am unable to apply id3 to mushroom.arff


Solution

  • The ID3 algorithm is an unpruned decision tree generation algorithm with the following properties:

    1. It can only deal with nominal attributes.
    2. It fails to handle missing values.
    3. Empty leaves may result in unclassified instances.

    The Mushroom dataset consists of 22 nominal attributes and satisfies the first condition, however upon inspection you’ll find the attribute 'stalk-root' has 2480 (31%) missing values. This is the reason it is unselectable in Weka by default when you try to classify.

    In order to fix this, you may proceed with these two solutions.

    1. You may remove the attribute.

      • Open the .arff file, select the stalk-root attribute in the Attributes tab and click Remove.
        1. You’ll now see that ID3 is available. I was able to get F-score of 1.0.

    Solution Image

    1. You may use techniques to handle missing values.

      • In situations where you do not want to lose out on information(in this case the “stalk-root” attribute), you may proceed with these techniques:
        1. Use a measure of central tendency for the attribute such as mean, median to replace the empty values.
        2. Use the attribute mean or median for all samples belonging to the same class as the given tuple.
        3. Use the most probable value to fill in the missing value using inference-based tools using a Bayesian formalism.