Search code examples
machine-learningwekadecision-tree

Why is Decision tree not working as expected in WEKA?


I am following a book "Machine Learning: Hands-On for Developers and Technical Professionals" to create decision tree with WEKA. Though I followed the same process as shown in the book, I am not getting the same decision tree. I am using C4.5 (J48) algorithm.

Data (arff file)

@relation ladygaga

@attribute placement {end_rack, cd_spec, std_rack}
@attribute prominence numeric
@attribute pricing numeric
@attribute eye_level {TRUE, FALSE}
@attribute customer_purchase {yes, no}

@data
end_rack,85,85,FALSE,yes
end_rack,80,90,TRUE,yes
cd_spec,83,86,FALSE,no
std_rack,70,96,FALSE,no
std_rack,68,80,FALSE,no
std_rack,65,70,TRUE,yes
cd_spec,64,65,TRUE,yes
end_rack,72,95,FALSE,yes
end_rack,69,70,FALSE,no
std_rack,75,80,FALSE,no
end_rack,75,70,TRUE,no
cd_spec,72,90,TRUE,no
cd_spec,81,75,FALSE,yes
std_rack,71,91,TRUE,yes

Expected Output Expected Output

My Output My Output

What am I doing wrong?


Solution

  • It is a problem with the book (Keeping the answer over here so that it can help other readers of the book).

    Book expects only one negative case in the end_rack category (Look for (5,1) in author's tree diagram). In data provided in the book and even on the book website, there are actually two negative cases (5,2). I removed one negative case and got the same decision tree as the book.

    Here is the corrected data arff file

    @relation ladygaga
    
    @attribute placement {end_rack, cd_spec, std_rack}
    @attribute prominence numeric
    @attribute pricing numeric
    @attribute eye_level {TRUE, FALSE}
    @attribute customer_purchase {yes, no}
    
    @data
    end_rack,85,85,FALSE,yes
    end_rack,80,90,TRUE,yes
    cd_spec,83,86,FALSE,no
    std_rack,70,96,FALSE,no
    std_rack,68,80,FALSE,no
    std_rack,65,70,TRUE,yes
    cd_spec,64,65,TRUE,yes
    end_rack,72,95,FALSE,yes
    end_rack,69,70,FALSE,yes
    std_rack,75,80,FALSE,no
    end_rack,75,70,TRUE,no
    cd_spec,72,90,TRUE,no
    cd_spec,81,75,FALSE,yes
    std_rack,71,91,TRUE,yes
    

    Correct Output Decision tree from corrected data