Search code examples
python-3.xnaivebayes

Training Bayesian Classifier


I'm trying to train and test a Bayesian Classifier in Python.

These lines of code are from an example I found here, but I don't understand what they do.

train_labels = np.zeros(702)
train_labels[351:701] = 1
train_matrix = extract_features(train_dir)

There is a similar code block later in the test set:

test_matrix = extract_features(test_dir)
test_labels = np.zeros(260)
test_labels[130:260] = 1

Wondering what this does and how I can apply it to a different classification example? What do the numbers in [] mean? Many thanks


Solution

  • The example code, referenced in your post, is training a binary classifier with Naive-Bayes and SVC model.

    train_labels = np.zeros(702)
    train_labels[351:701] = 1
    train_matrix = extract_features(train_dir)
    

    This is setting the label for 702 records with all 0 initially. and sets the later half with 1. Binary labels like: spam or ham, true or false, etc. The extract_features builds the {(docid, wordid)->wordcount,..} which is input to these models.

    Once you train the model, you need to see how well it performs against a test set. Here you are using 260 records as test set with first half all 0s and the later half all 1s.

    test_matrix = extract_features(test_dir)
    test_labels = np.zeros(260)
    test_labels[130:260] = 1
    

    Finally, you run the prediction against the test set and evaluate how close is the accuracy to the test_set of both of these models (NB and SVC).