Search code examples
matlabclassificationtext-classification

Implementation of text classification in MATLAB with naive bayes


I want to implement text classification with Naive Bayes algorithm in MATLAB. I have for now 3 matrices:

  1. Class priors (8*2 cell - 8 class names, for each class its % from the training)
  2. Training Data: word count matrices - (15000*9 cell- for each class, counting of every feature (word) . the last column is each word count for all the documents.
  3. Test Data: a matrices with (2000*1) cell - and for each cell a list of words which represent the document.

What should I do now? I want to calculate recall and precision for the test set. I took a look in the matlab naive bayes functions, and it suppose to be simple , but I'm not sure how and where to start.

Thanks


Solution

  • Here is an example of Naive Bayes classification,

    x1 = 5 * rand(100,1);
    y1 = 5 * rand(100,1);
    data1 = [x1,y1];
    x2 = -5 * rand(100,1);
    y2 =  5 * rand(100,1);
    data2 = [x2,y2];
    x3 = -5 * rand(100,1);
    y3 = -5 * rand(100,1);
    data3 = [x3,y3];
    traindata = [data1(1:50,:);data2(1:50,:);data3(1:50,:)];
    testdata = [data1(51:100,:);data2(51:100,:);data3(51:100,:)];
    label = [repmat('x+y+',50,1);repmat('x-y+',50,1);repmat('x-y-',50,1)];
    

    That was my data, three classes. Now the classification,

    nb = NaiveBayes.fit(traindata, label);
    ClassifierOut = predict(nb,testdata);
    

    I think you should change your data to matrix instead of cell, but the labels are okey.

    Here are the results, blue is the training data and the rest is the classifier output for three classes.

    enter image description here

    You can also see here for calculation of recall and precision for multi-class data.