Search code examples
pythonpython-2.7machine-learningnaivebayesdata-science

Naive Bayes Classifier: Only get 30-40% accuracy on iris data set


I'm trying to implement Naive Bayes Classifier in python for the last few days with the iris data set from UCI (http://archive.ics.uci.edu/ml/datasets/Iris). When trying to classify 100 random samples i get only 30-40% accuracy. I think my probability function is right because I tested it with the example from wikipedia (https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Examples)

Now here's what I do:

  • I load the data
  • I divide the data into 3 classes
  • I calculate the mean and variance for each class

Then for 100 random samples I:

  • calculate the probability for each feature to belong to a class
  • Calculate the posterior numerator by multiplying each probability for that class

  • Store the values in a list and get the index of the highest value

  • compare the highest value index to the real index (Check if prediction is right)

And somehow I only get 30-40%, am I doing something wrong?

If you want to see the code, it's here: http://pastebin.com/sUYm97qi


Solution

  • LOL -- you wrote very concise/clean code so I was very confused until I saw the very end.

    You were comparing classes[max_index], the class name of the prediction with y[max_index], the max index-th instance label value.

    Try changing your code to

    if(classes[max_index] == y[q]):
        corr += 1
    

    You should get around 96%