php statistics document-classification expectation-maximization

Expectation maximization algorithm implementation with NaiveBayes

I've implemented the Naive-Bayes Document classification with good text filtration and i have accepted statistical results with a good accuracy , i need to enhance my results using an EM algorithm .

But i don't know if i may apply the EM algorithm with the Naive-Bayes results or apply the algorithm on the data and start all over hence i can compare results

In both cases i need to understand the EM algorithm on this issue cause it's really confusing me

Any well-explained documents will be appreciated

Solution

EM generally helps you with unlabeled data. If you have some unlabeled data, you basically use it in a cycle like this

estimate some initial parameters, perhaps even randomly
while not converged:
  relabel data using estimates
  update estimates using new labels

If you are doing supervised learning, the relabel step is blowing away your labels, and is likely to make your classification worse.

On the other hand, this is a nice, detailed tutorial on semi-supervised naive bayes for text classification. If you have some small set of labelled documents and a large set of unlabeled documents, you can use them to estimate the initial parameters, and then do the iterative steps on unlabeled data, and end up with a better classifier.