Search code examples
machine-learningclassificationweka

How can I use WEKA Machine Learning software to classify the following type of data?


I have a .csv file which consists of 10 columns. The first 9 are related to the properties of a particular item, while the 10th column has the "Class" which states which item it is.

I am trying to run the following classifiers -

  • Naive Bayes
  • ZeroR
  • IBK
  • Neural Network

I am having some trouble trying to proceed. I am supposed to divide my data such that - First half is to be trained and test the results using the second half of the data.

I begin with going to the "Explorer" and opening the .csv file. I select all the attributes, including "CLASS' and then go to the classify tab.

From there, I select the "Percentage Split" as 50% and simply "Start" the different classifiers (as mentioned before).

So these are the questions -

  • Is the right method?
  • Do I need to include the "CLASS" column as an attribute too?
  • What kind of modifications can I do in the GUI to improve the test results for the classifiers without changing the data? I am trying to understand the working of these algorithms w.r.t WEKA as well and so want to try different things.

Can anyone help me with this?

Thanks!


Solution

    • Yes the method is right (for Weka anyway)
    • Yes, you need to include the CLASS. Particularly for algorithms requiring supervised training. It is used to train the algorithm. Without it how would the trainer know what the answer should be?
    • You can try adjusting the parameters but you should do this to get a better response to the TRAINING data.Of course, there is always the possibility of overfit. If you allow the testing to influence the training then you have just used the test data as an auxiliary training set -- it's no longer test data.

    Someone asked a similar question here How to build a good training data set for machine learning and predictions? They look like different questions but involve the same considerations.