Search code examples
machine-learningclassificationwekacross-validationtraining-data

What do Weka's different test options mean?


So I've recently started using Weka and there are several test options when building a tree with, for example, J48. The following are the options, including my undestanding of them:

  1. Use training set - I know just that it's highly optimistic and not necessarily useful. Even Weka's documentation at 2.1.5 isn't being all too specific.
  2. Supplied test set - Pretty self-explanatory, you supply it a test set.
  3. Cross-Validation - I understood it by reading this short example.
  4. Percentage Split - I assume it means partitioning the data set into two sets of a certain percentage, one set for training and one for testing.

What I want to know is what exactly is the training set (first option) and what it does. Where does it get this training set from and what data does it test on exactly? And also if you could correct my understanding of the rest, if it's wrong.


Solution

  • The first option simply means "use all data loaded to run this algorithm". You choose this

    • to try things out,
    • to have a first look at the results-section in the output,
    • to check the performance/run duration,
    • to check if Weka's output matches the implementation of the same algorithm of a different software, say R or Matlab.
    • ...