Search code examples
algorithmmachine-learningclassificationwekaknn

Is kNN a statistical classifier?


I'm currently working on a Machine Learning project for my Artificial Intelligence exam. The goal is to correctly choose two classification algorithms to compare using WEKA, bearing in mind that these two algorithms must be different enough to give the comparison a reason to be made. Besides, the algorithms must handle both nominal and numeric data (I suppose this is mandatory to let the comparison be made). My professor suggested to choose a statistical classifier and a decision tree classifier, for example, or to delve into a comparison between a bottom-up classifier and a top-down one.

Since I have very little experience in the Machine Learning field, I am doing some research on the various algorithms WEKA offers, and I stepped on kNN, that is, k-nearest neighbors algorithm. Is it statistical? And could it be compared with a Decision Stump algorithm, for example?

Or else, can you suggest a couple of algorithms that match these requirements I have pointed out above?

P. S.: Handled data must be both numerical and nominal. On WEKA there are numerical/nominal features and numerical/nominal classes. Do I have to choose algorithms with both numerical/nominal features AND classes or just one of them?

I would really appreciate any help guys, thanks for your patience!


Solution

  • Based on your professor's description, I would not consider k-Nearest Neighbors (kNN) a statistical classifier. In most contexts, a statistical classifier is one that generalizes via statistics of the training data (either by using statistics directly or by transforming them). An example of this is the Naïve Bayes Classifier.

    By contrast, kNN is an example of Instance-Based Learning. It doesn't use statistics of the training data; rather, it compares new observations directly to the training instances to perform classification.

    With regard to comparison, yes you can compare performance of kNN with a Decision Stump (or any other classifier). Since any two supervised classifiers will yield a classification accuracies with respect to your training/testing data, you can compare their performance.